The top post of this question helped me equally divide a vector into an even set of chunks:
Split a vector into chunks in R
My problem now is that I would like to construct data frames out of the output. Here is the problem in R syntax:
d <- rpois(73,5)
solution1 <- split(d, ceiling(seq_along(d)/20))
ERROR <- as.data.frame(solution1)
The error that you should see is "arguments imply differing number of rows." I'm especially confused because I thought that the as.data.frame() function could handle this problem, as evident here:
http://www.r-bloggers.com/converting-a-list-to-a-data-frame-2/
Thanks for all your help!
EDIT 1:
I am close to a solution with this line, however, there are NA values that are being introduced that distort the output that I seek:
ldply(solution1,data.frame)
ldply is from the plyr package
Did you read the ?split help page? Did you notice the unsplit() function? That sounds like exactly what you're trying to do here.
d <- rpois(73,5)
f <- ceiling(seq_along(d)/20) #factor for splitting
solution1 <- split(d, f)
unsplit(solution1 , f)
I'm not sure what you expected your data.frame to look like, but the error message you got was because as.data.frame() was trying to create a new column in your data.frame for each item in solution1. And since each of those vectors in the list has a different number of elements, you cannot make a data.frame from that. A data.frame requires that every column has the same number of rows.
Related
I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:
The code I apply is:
trans_matrix_complete <- mod_attrib$transition_matrix
trans_matrix_complete[which(trans_matrix_complete$channel_from=="_3RDLIVE"),]
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy)
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
levels = c(levels(trans_matrix_complete$channel_to)))
trans_matrix_complete <- dcast(trans_matrix_complete,
channel_from ~ channel_to,value.var = 'transition_probability')
And the trans_matrix_complete output I get is the following:
Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:
Where
a) the row number is different. I'm not sure why there are two dots listed in the first case
b) and too, trying to assign rownames to the dataframe by
row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from
does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.
Any idea about this weird behavior?
I resolved moving from dcast() to spread() of the package tidyverse using the following function:
trans_matrix_complete<-spread(trans_matrix_complete,
channel_to,transition_probability)
By applying spread() the two dataframe the matrix output is of the same format and accept rownames without any issue.
So I suspect it is all realted to the fact that dcast() and reshape2 package are not maintained anymore
Regards
I need to check whether data frame is "empty" or not ("empty" in a sense that dataframe contain zero finite value. If there is mix of finite and non-finite value, it should NOT be considered "empty")
Referring to How to check a data.frame for any non-finite, I came up with one line code to almost achieve this objective
nrow(tmp[rowSums(sapply(tmp, function(x) is.finite(x))) > 0,]) == 0
where tmp is some data frame.
This code works fine for most cases, but it fails if data frame contains a single row.
For example, the above code would work fine for,
tmp <- data.frame(a=c(NA,NA), b=c(NA,NA)) OR tmp <- data.frame(a=c(3,NA), b=c(4,NA))
But not for,
tmp <- data.frame(a=NA, b=NA)
because I think rowSums expects at least two rows
I looked at some other posts such as https://stats.stackexchange.com/questions/6142/how-to-calculate-the-rowmeans-with-some-single-rows-in-data, but I still couldn't come up a solution for my problem.
My question is, are there any clean ways (i.e. avoid using loops and ideally one liner) to check for being "empty" for any dataframes?
Thanks
If you are checking all columns, then you can just do
all(sapply(tmp, is.finite))
Here we are using all rather than the rowSums trick so we don't have to worry about preserving matrices.
I have 3 vectors: MRI, MRI_high, MRI_low. With the _low and _high being half the length of the first. My objective is to put them into one same element, so that I can make iterations (I have a bunch of vectors following same format)
When I wrote: data.entry(MRI, MRI_high, MRI_low) A window popped up with my data arranged in columns and of correct length, problem, I cant use that.
When I used MRI_vector <- data.frame(MRI, MRI_high, MRI_low) The function somehow gave me 3 elements of equal length, by duplicating the shorter lists.
What is a solution to this? And do data frames need equal lengths for their elements?
Moreover, I tried using lists, however my values are then not numerical
One option will be to place them in a list and pad with NA to make the lengths equal before converting to 'data.frame'
lst <- mget(ls(pattern = "MRI.*"))
df1 <- data.frame(lapply(lst, `length<-`, max(lengths(lst))))
I found a solution to my problem:
I used the list, however, when computing statistics I used as.numeric(unlist()) which still allows me to iterate.
Although any further comments on this are appreciated for me to understand this strange data-frame behaviour!
Suppose I've got a data frame called someMatrix. Now in this matrix I want to replace only the first three rows of the 4 column.
I came up with this idea.
(someMatrix[,4])[1:3] <- replacement
but I get following error: could not find function "(<-"
Any idea how I could solve this?
Thanks!
You may subset with brackets as many times you want, without bothering with parentheses:
a <- cbind(rnorm(10), rnorm(10))
a[1:5, ][2:3, ][, 2][1]
I am having trouble turning my data.frame into a matrix format. Because I wanted to change my data.frame with mostly factor variables into a numeric matrix, I used the following code
UN2010frame <- data.matrix(lapply(UN2010, as.numeric))
However when I checked the mode of the UN2010frame, it still showed up as a list. Because the code I want to run (Ordrating) does not accept data in a list format, I used UN2010matrix <- unlist(UN2010frame) to unlist my matrix. When I did this, my first row ( which was formerly a row with column names) turned into NAs. This was a problem for me because when I tried to run an ordinal IRT model using this data set, I got the following error message.
> Error in 1:nrow(Y) : argument of
> length 0
I think it is because all the values in my first row are now gone.
If you could help me on any front, It would be deeply appreciated.
Thank you very much!
Haillie
First, the correct use of data.matrix is :
data.matrix(UN2010)
as it converts automatically to numeric. The lapply in your code is the first source for the error you get. You put a list in the data.matrix function, not a dataframe. So it returns a list of matrices, and not a matrix.
Second, unlist returns a vector, not a matrix. So pretty sure you won't find a "first row with NA", as you have a vector. Which might explain part of your confusion.
You probably have a character column somewhere. Converting this to numeric gives NA. If you don't want this, then exclude them from the further analysis. One possibility is to use colwise() from the plyr package to convert only the factors:
colwise(as.numeric,is.factor)(UN2010)
Which returns a dataframe with only the factors. This can be easily converted by data.matrix() or as.matrix(). Alternatively you use the base solution :
id <- sapply(UN2010,is.character)
sapply(UN2010[!id],as.numeric)
which will return you a matrix with all non-character columns converted to numeric.If you really want to keep the dataframe with all original columns, you can do :
UN2010frame <- UN2010
UN2010frame[!id] <- lapply(UN2010[!id],as.numeric)
Toy example code :
UN2010 <- data.frame(
F1 = factor(rep(letters[1:3],10)),
F2 = factor(rep(letters[5:10],5)),
Char = rep(letters[11:16],each=5),
Num = 1:30,
stringsAsFactors=FALSE
)
Try as.data.frame instead of data.matrix.