probabily it is refusee.
I want to transpose a data frame that has both numeric and character columns. I have some lines where the id is repeated 2 or even more times. I would like to have a final dataframe where I have this data in one line.
I thought about using both the data.table and reshape2 library (they have similar functions) but I can't find the right combination to do what I want and I'm going crazy. Could someone give me some help?
Here a modified example of my database
example_data <-data.frame(cod=c(20,20,20,20,20,20,20,40,80,80,80,80,80,240),
id=c(44,68,137,150,186,236,289,236,44,150,155,236,68,289),
textVar=c('aaaa','aaaa','aaaa bbbb','aaaa','cccc','cccc','cccc bbb','dddd','dddd cccc','dddd','ffff','ffff gggg','ffff','hhhh'),
ww=c(4,4,4,4,4,4,4,45,118,118,118,118,118,118))
If for example consider the column with id=44 my output is like this:
exampleRow <-data.frame(cod_1=c(20),id=c(44),textVar_1=c('aaaa'),ww_1=c(4),cod_2=c(80),id=c(44),textVar_2=c('dddd cccc'),ww_2=c(118))
Related
I have a dataset that is like this: list
df
200000
5666666
This dataset continues to 5551
Another dataset has also 5551 observations. I want to merge list dataset with another dataset. But no variable is the same. Just row names are the same.
I gave that
merge(list,df,by="rownames")
The error message is that it should have a valid column name
I tried also merge_all but not work
It is not working? Could someone please help
It's good practice to be more precise with the naming of your dataframe variables. I wouldn't use list but something like df_description. Either way, merging by rownames can be achieved by using by = "row.names" or by = 0. You can read more on merge() in the documentation (under "Details").
This probably has a simple fix, but I'm relatively new to using R and could use some assistance.
The toy data I'm using for a gene network analysis has rows that look like this:
whereas the data that I've uploaded has rows that look like this:
.
The code I'm using refers to the row names to map on as gene names. I am able to successfully run this analysis, however, the output I end up with has lists of row numbers where there should be lists of gene names.
Is there a simple way that I can convert my data into the toy data format so that the row names are gene names instead of numbers?
I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:
The code I apply is:
trans_matrix_complete <- mod_attrib$transition_matrix
trans_matrix_complete[which(trans_matrix_complete$channel_from=="_3RDLIVE"),]
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy)
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
levels = c(levels(trans_matrix_complete$channel_to)))
trans_matrix_complete <- dcast(trans_matrix_complete,
channel_from ~ channel_to,value.var = 'transition_probability')
And the trans_matrix_complete output I get is the following:
Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:
Where
a) the row number is different. I'm not sure why there are two dots listed in the first case
b) and too, trying to assign rownames to the dataframe by
row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from
does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.
Any idea about this weird behavior?
I resolved moving from dcast() to spread() of the package tidyverse using the following function:
trans_matrix_complete<-spread(trans_matrix_complete,
channel_to,transition_probability)
By applying spread() the two dataframe the matrix output is of the same format and accept rownames without any issue.
So I suspect it is all realted to the fact that dcast() and reshape2 package are not maintained anymore
Regards
I'm rather new to the tidyverse, and I want to learn, so this question is specifically about doing this the tibble way, using things like select(), mutate() and the like. I know how to achieve the desired effect with data frames matching column indices.
I have a rather large tibble, containing columns named Day1, Day2, ..., Day48, among others. I'd like to add columns of averages for every week, using regular expressions (assume the column names could be more complicated). How would I achieve this?
Figured it out:
data <- mutate(data, Week1=select(data, matches("^Day[1-7]$")) %>% rowMeans(na.rm=T))
So I have two columns. I need to add a third column. However this third column needs to have A for the first amount of rows, and B for the second specified amount of rows.
I tried adding this data_exercise_3 ["newcolumn"] <- (1:6)
but it didn't work. Can someone tell me what I'm doing wrong please?
Looks like you're having a problem with subsetting a data frame correctly. I'd recommend reviewing this concept before you proceed much further, either via a Coursera course or on a website like this UCLA R learning module on subsetting data frames. Subsetting is a crucial component of data wrangling with R, and you'll go much faster with a solid foundation of the basics!
You can assign values to a subset of a data frame by using [row, column] notation. Since your data frame is called data_exercise_3 and the column you'd like to assign values to is called 'newcolumn', then assuming you want the first 6 rows as 'A' and the next 3 as 'B', you could write it like this:
data_exercise_3[1:6,'newcolumn'] <- 'A'
data_exercise_3[7:9,'newcolumn'] <- 'B'
data_exercise_3$category <- c(rep("A",6),rep("B",6))