Transform DF Column Values to Matrix in R - r

I want to count unique values of categorical variable based of a column based on Date.
I want result as a matrix where column names are the values categorical variable, row names will be unique Date values and their cell values is the unique count.
The below links solves the group by problem but I am looking for the transformed df:
How to add count of unique values by group to R data.frame
R: Extract unique values in one column grouped by values in another column
My df has more than 50,000 rows and looks like:
dat <- data.frame(Date = c('06/08/2018','06/08/2018','07/08/2018','07/08/2018','08/08/2018','09/08/2018','09/08/2018','11/08/2018','11/08/2018','13/08/2018'),
Type= c('A','B','C','A','B','A','A','B','C','C'))
I want my resultant matrix to have "A", "B" ,"C" as new columns, "Date" as the rows and values in matrix as the unique count, shown in below image:
Also, it would be great that we don't hardcode categorical values. So, in future if instead of 3 it becomes 4, then code automatically handles it.

How about using table...
mat <- table(dat$Date, dat$Type)
mat
A B C
06/08/2018 1 1 0
07/08/2018 1 0 1
08/08/2018 0 1 0
09/08/2018 2 0 0
11/08/2018 0 1 1
13/08/2018 0 0 1

What you're looking for is dcast():
dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")
This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().

This uses spread
library(tidyverse)
spread_data <- (data, key = type, value = 2)

Related

How to remove NA values in a specific column of a dataframe in R?

I have a data frame with a large number of observations and I want to remove NA values in 1 specific column while keeping the rest of the data frame the same. I want to do this without using na.omit(). How do I do this?
We can use is.na or complete.cases to return a logical vector for subsetting
subset(df1, complete.cases(colnm))
where colnm is the actual column name
This is how I would do it using dplyr:
library(dplyr)
df <- data.frame(a = c(1,2,NA),
b = c(5,NA,8))
filter(df, !is.na(a))
# output
a b
1 1 5
2 2 NA

How do I make a variable that represents the row number, when rows are not sorted in order?

Pic shows the row number order
I am trying to add a variable to my data set that represents the row number; however every code I've found adds them in order as the rows are currently (1,2,3,4,5), rather than in the order the View option shows (129, 98, 21, 09). I need the order shown in the View option, as I am trying to merge with a another data set, and need the correct ("original row number").
I cannot add row numbers before making changes to the data set as the function doesn't work when I add the ID number.
Alternatively, being able to sort the data by row number would also help, but I don't know how to do that either (clicking on the arrow above the row number does nothing).
A bit of context
I am classifying network nodes in R. I made a matrix from the networks nodes and edges (using nodes2vec), and have to merge this matrix with nodes labels data set (this data set contains one variable which shows if nodes are positive or negative). The picture above shows the created matrix, and the original row numbers from the network data set are no longer in the original order. I need to add a variable to the matrix, that I converted to a data frame using:
netdf1 <- as.data.frame(network.node2vec)
that represents the original row number
what I tried
netdf1 <- netdf1 %>% mutate(id = row_number())
This just adds the row number as the rows are currently ordered so 1,2,3,4...
WHAT WORKED IN THE END == CORRECT ANSWER
db$ID <- rownames(db)
If I do understand your question right you have some kind of dataframe with row names that are not continuus? And now you want to have these row names in an extra column as numeric values?
You can use the row.names()-function and can convert them to numeric if you like:
# just creating a DF that might show what you mean:
testDF <- data.frame(x = 1:10, y = sample((1:1000), 10))
testDF <- testDF[testDF$y < 500,]
View(testDF)
# one possible way to get the row names
testDF$rowNum <- as.numeric(row.names(testDF))
And try to type ?sort to the console if you like to learn something about sorting vectors.
Let's say you have a data frame with row names that are out of order:
my_data <- data.frame(row.names = 5:1,
V1 = 1:5)
#> my_data
# V1
#5 1
#4 2
#3 3
#2 4
#1 5
dplyr::row_number() will add row numbers based on the current sorting, not based on the row names. (A general practice in the tidyverse is to eschew keeping useful data in the row names and to instead incorporate any sorts of row ID info into a variable.)
So you could use #user2554330's advice and add my_data$ID <- row.names(my_data) or the tidyverse equivalent of my_data %>% tibble::rownames_to_column(var = "ID"), then sort by that column.
my_data %>%
tibble::rownames_to_column(var = "ID") %>%
arrange(ID)
ID V1
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1

Conditional grouping in column in data frame in R

I have a data frame which looks like this
where value of b ranges from 1:31 and alpha_1,alpha_2 and alpha_3 can only have value 0 and 1. for each b value i have 1000 observations so total 31000 observations. I want to group the entire dataset by b and wanted to count value of alpha columns ONLY when its value is 1. So the end result would have 31 observations (unique b values from 1:31) and count of alpha values when its 1.
how do i do this in R. I have tried using pipe methods in dplyr and nothing seems to be working.
We can use
library(dplyr)
df1 %>%
group_by(b) %>%
summarise_at(vars(starts_with("alpha")), sum)

Summing values in two different columns in R

I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE

How to create temporal sequence column C so that in each level of column A, a specific message from column B corresponds to 0 in column C

My data contains columns trial,sequence and message, the message Onset occurs only once in each trial, but at different sequence positions in different trials.
data<-data.frame(trial=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),sequence=c(1:10,1:10,1:10),message=c(NA,NA,NA,NA,"Onset",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Onset",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,"Onset",NA,NA,NA))
I want to create a new column called sequence_new so that in each trial level, the message Onset corresponds to "0" in the new column, like the following:
data_n<-data.frame(trial=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3),sequence=c(1:10,1:10,1:10),message=c(NA,NA,NA,NA,'Onset',NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,'Onset',NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,'Onset',NA,NA,NA),sequence_new=c(-4,-3,-2,-1,0,1,2,3,4,5,-5,-4,-3,-2,-1,0,1,2,3,4,-6,-5,-4,-3,-2,-1,0,1,2,3))
Try
library(data.table)
setDT(data)[, sequence_new:=(1:.N)-which(message=='Onset'),trial]
Or
library(dplyr)
data %>%
group_by(trial) %>%
mutate(sequence_new = row_number()- which(message=='Onset'))
Or using base R
data$sequence_new <- with(data, ave(seq_along(message), trial,
FUN=seq_along) -ave(message=='Onset', trial, FUN=which))

Resources