I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:
The code I apply is:
trans_matrix_complete <- mod_attrib$transition_matrix
trans_matrix_complete[which(trans_matrix_complete$channel_from=="_3RDLIVE"),]
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy)
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
levels = c(levels(trans_matrix_complete$channel_to)))
trans_matrix_complete <- dcast(trans_matrix_complete,
channel_from ~ channel_to,value.var = 'transition_probability')
And the trans_matrix_complete output I get is the following:
Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:
Where
a) the row number is different. I'm not sure why there are two dots listed in the first case
b) and too, trying to assign rownames to the dataframe by
row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from
does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.
Any idea about this weird behavior?
I resolved moving from dcast() to spread() of the package tidyverse using the following function:
trans_matrix_complete<-spread(trans_matrix_complete,
channel_to,transition_probability)
By applying spread() the two dataframe the matrix output is of the same format and accept rownames without any issue.
So I suspect it is all realted to the fact that dcast() and reshape2 package are not maintained anymore
Regards
Related
I have a dataset that is like this: list
df
200000
5666666
This dataset continues to 5551
Another dataset has also 5551 observations. I want to merge list dataset with another dataset. But no variable is the same. Just row names are the same.
I gave that
merge(list,df,by="rownames")
The error message is that it should have a valid column name
I tried also merge_all but not work
It is not working? Could someone please help
It's good practice to be more precise with the naming of your dataframe variables. I wouldn't use list but something like df_description. Either way, merging by rownames can be achieved by using by = "row.names" or by = 0. You can read more on merge() in the documentation (under "Details").
probabily it is refusee.
I want to transpose a data frame that has both numeric and character columns. I have some lines where the id is repeated 2 or even more times. I would like to have a final dataframe where I have this data in one line.
I thought about using both the data.table and reshape2 library (they have similar functions) but I can't find the right combination to do what I want and I'm going crazy. Could someone give me some help?
Here a modified example of my database
example_data <-data.frame(cod=c(20,20,20,20,20,20,20,40,80,80,80,80,80,240),
id=c(44,68,137,150,186,236,289,236,44,150,155,236,68,289),
textVar=c('aaaa','aaaa','aaaa bbbb','aaaa','cccc','cccc','cccc bbb','dddd','dddd cccc','dddd','ffff','ffff gggg','ffff','hhhh'),
ww=c(4,4,4,4,4,4,4,45,118,118,118,118,118,118))
If for example consider the column with id=44 my output is like this:
exampleRow <-data.frame(cod_1=c(20),id=c(44),textVar_1=c('aaaa'),ww_1=c(4),cod_2=c(80),id=c(44),textVar_2=c('dddd cccc'),ww_2=c(118))
Im am trying to split a column of a dataframe into 2 columns using transform and colsplit from reshape package. I don't get what I am doing wrong. Here's an example...
library(reshape)
df1 <- data.frame(col1=c("x-1","y-2","z-3"))
Now I am trying to split the col1 into col1.a and col1.b at the delimiter '-'. the following is my code...
df1 <- transform(df1,col1 = colsplit(col1,split='-',names = c('a','b')))
Now in my RStudio when I do View(df1) I do get to see col1.a and col1.b split the way I want to.
But when I run...
df1$col1.a or head(df1$col1.a) I get NULL. Apparently I am not able to make any further operations on these split columns. What exactly is wrong with this?
colsplit returns a list, the easiest (and idiomatic) way to assign these to multiple columns in the data frame is to use [<-
eg
df1[c('col1.a','col1.b')] <- colsplit(df1$col1,'-',c('a','b'))
it will be much harder to do this within transform (see Assign multiple new variables on LHS in a single line in R)
I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])
I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric