Convert pivot table generated from pivottabler package to dataframe - r

I'm trying to make a pivot table with pivottabler package. I want to convert the pivot table object to dataframe, so that I can convert it to data table (with DT) and render it in Shiny app, so that it's downloadable.
library(pivottabler)
pt = qpvt(mtcars, 'cyl', 'vs', 'n()')
I tried to convert it to matrix
as.data.frame(pt)
I got error message like below:
Error in as.data.frame.default(pt) : cannot coerce class ‘c("PivotTable", "R6")’ to a data.frame
Does anyone know how to convert the pivot table object to dataframe?

It is an R6 class. One option would be to extract with asDataFrame which can be revealed if we check the str
str(pt)
#...
#...
#asDataFrame: function (separator = " ", stringsAsFactors = default.stringsAsFactors())
#asJSON: function ()
#asList: function ()
#asMatrix: function (includeHeaders = TRUE, repeatHeaders = FALSE, rawValue = FALSE)
#asTidyDataFrame: function (includeGroupCaptions = TRUE, includeGroupValues = TRUE,
...
Therefore, applying asDataFrame() on the R6 object
out <- pt$asDataFrame()
out
# 0 1 Total
#4 1 10 11
#6 3 4 7
#8 14 NA 14
#Total 18 14 32
str(out)
#'data.frame': 4 obs. of 3 variables:
#$ 0 : int 1 3 14 18
#$ 1 : int 10 4 NA 14
#$ Total: int 11 7 14 32
or to get a matrix, asMatrix
pt$asMatrix()
# [,1] [,2] [,3] [,4]
#[1,] "" "0" "1" "Total"
#[2,] "4" "1" "10" "11"
#[3,] "6" "3" "4" "7"
#[4,] "8" "14" "" "14"
#[5,] "Total" "18" "14" "32"

Related

Conditionally replace a value in a matrix with a new value if condition is TRUE

I am currently trying to create a new matrix by looping over the old one. The thing that I would want to change in the new matrix is replacing certain values with the character "recoding".Both of the matrixes should have 10 columns and 100 rows.
In the current case, the certain value is one that matches with on eof the values in vector_A.
e.g:
for (i in 1:10) {
new_matrix[,i] <- old_matrix[,i]
output_t_or_f <- is.element(new_matrix[,i],unlist(vector_A))
if (any(output_t_or_f, na.rm = FALSE)) {
replace(new_matrix, list = new_matrix[,i], values = "recode")
}
}
so output_t_or_f should either take on the value TRUE or FALSE, depending on whether i is in vector_A
and if output_t_or_f is TRUE then the old value should be replaced with the character "recode"
Currently the new_matrix looks just like the old_matrix so I guess there is a problem with the if statement?
Unfortunately, I can't really share my Data but I put some example data together:
if old_matrix looks like this:
> old_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
and vector_A looks like this:
> vector_A
[1] 12 27 30 42 37 9
then the new matrix should looks like this:
new_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] "1" "6" "11" "16" "21"
[2,] "2" "7" "recoding" "17" "22"
[3,] "3" "8" "13" "18" "23"
[4,] "4" "recoding" "14" "19" "24"
[5,] "5" "10" "15" "20" "25"
I am very new to R and can't seem to find the problem. Would appreciate any help!!
Thanks :-)
Since the replacements are the same in every column you shouldn't need a loop. Try this:
new_matrix <- old_matrix
new_matrix[new_matrix %in% vector_A] <- "recode"

R Writing to data frame from inside for-loop

Brand new to R programming so please forgive me if I'm using wrong terminologies.
I'm trying to insert/append values to a data frame from inside a for-loop.
I can get the right values if I just print() them, but when I try to put it inside the data frame, I get mostly NA's. If I run this code it prints out the values I want.
output <- data.frame()
for (i in seq_along(Reasons)){
assign(paste(Reasons[i]), sum(ER$Reason == paste(Reasons[i])))
Tot <- get(paste(Reasons[i]))
assign(paste(Reasons[i],'ER',sep="_"), sum(grepl("ER|Er", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Er <- get(paste(Reasons[i],'ER',sep="_"))
assign(paste(Reasons[i],'adm',sep="_"), sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Adm <- get(paste(Reasons[i],'adm',sep="_"))
assign(paste(Reasons[i],'admrate',sep="_"), sprintf("%.0f%%", (sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))/(sum(ER$Reason == paste(Reasons[i])))*100))
Rate <- get(paste(Reasons[i],'admrate',sep="_"))
print(c(Er,Adm,Tot,Rate))
#clear variables just created
rm(list=ls(pattern=Reasons[i]))
rm(Tot,Er,Adm,Rate)
}
[1] "7" "13" "20" "65%"
[1] "4" "8" "12" "67%"
[1] "12" "12" "24" "50%"
[1] "23" "7" "30" "23%"
[1] "7" "1" "8" "12%"
[1] "3" "1" "4" "25%"
[1] "3" "0" "3" "0%"
[1] "6" "5" "11" "45%"
[1] "2" "9" "11" "82%"
[1] "2" "4" "6" "67%"
[1] "10" "4" "14" "29%"
[1] "5" "0" "5" "0%"
[1] "10" "4" "14" "29%"
[1] "0" "3" "3" "100%"
[1] "7" "3" "10" "30%"
[1] "0" "4" "4" "100%"
But when I use
output <- rbind(output, c(Er, Adm, Tot, Rate))
Instead of
print(c(Er,Adm,Tot,Rate))
I get the first row of values (7, 13, 20, 65%), then all NA's except the "7" in rows 5 and 15... What am I doing wrong?
Thank you in advance
As I don't know what your data look like I cannot reproduce your error. If I understand it correctly, for each value in Reasons you want to find (a) the total number of observations, (b) the number of observations with the string "Er" in the variable Disposition, (c) the number of observations with the string "Admi" in the variable Disposition and (d) the percentage of observations with the string "Admi" in the variable Disposition. If that is the case then you don't have to use assign and get to do this.
Here is a simpler way to do it (although it's not the best way to do it, see below):
## Here I just generated some data that might look like the data
## you are dealing with:
Reasons <- LETTERS[1:10]
ER <- data.frame(Reason = LETTERS[sample.int(10,100, replace = TRUE)],
Disposition = c("ER", "Admi", "SomethingElse")[sample.int(3,100, replace = TRUE)])
output <- data.frame()
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output <- rbind(output, c(Er, Adm, Tot, Rate))
}
> output
X.4. X.3. X.10. X.30...
1 4 3 10 30 %
2 2 3 6 50 %
3 2 1 6 17 %
4 5 2 14 14 %
5 3 5 11 45 %
6 2 4 11 36 %
7 3 6 14 43 %
8 2 2 5 40 %
9 1 7 11 64 %
10 4 4 12 33 %
Dynamically appending rows to a data frame or matrix is generally not a very good idea as it is quite memory intensive. If you know the dimensions of your matrix beforehand (as you do) you should initialize it with the right size and then fill the entries inside your loop:
## Initialize data:
output <- matrix(nrow = length(Reasons), ncol = 4)
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output[i,] <- c(Er, Adm, Tot, Rate)
}
There are, however, even simpler ways to do this kind of evaluation. You could e.g. use the dplyr package, where you can group the data by a variable (the different Values of ER$Reason in your case) and the evaluate the values you need:
## Load the package 'dplyr'
library(dplyr)
## Group the variable and evaluate:
output <- ER %>% group_by(Reason) %>%
dplyr::summarise(Er = sum(grepl("ER|Er", Disposition)),
Adm = sum(grepl("Admi|admi|ADMI|ADmi", Disposition)),
Tot = n(),
Rate = paste(round(Adm/Tot*100), "%"))
> output
# A tibble: 10 × 5
Reason Er Adm Tot Rate
<chr> <int> <int> <int> <chr>
1 A 4 3 10 30 %
2 B 2 3 6 50 %
3 C 2 1 6 17 %
4 D 5 2 14 14 %
5 E 3 5 11 45 %
6 F 2 4 11 36 %
7 G 3 6 14 43 %
8 H 2 2 5 40 %
9 I 1 7 11 64 %
10 J 4 4 12 33 %

how to select only integer values of a column [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 6 years ago.
my data have many columns with different names and want see all numeric values only in column name_id and store those values in z.
I want z should contains only numeric values of column name_id of data, if any alphabet is there in column then it should not get store in z.
z <- unique(data$name_id)
z
#[1] 10 11 12 13 14 3 4 5 6 7 8 9
#Levels: 10 11 12 13 14 3 4 5 6 7 8 9 a b c d e f
when i tried this
z <- unique(as.numeric(data$name_id))
z
# [1] 1 2 3 4 5 6 7 8 9 10 11 12
output contains values only till 12 but column has values greater than 12 also
Considering your data frame as
> b
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "Name" "Age"
Apply this :
regexp <- "[[:digit:]]+"
> z <- str_extract(b , regexp)
z[is.na(z)] <- ""
> z
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "" ""
Hope this helps .

Importing non-rectangular data as rectangular in R

I need to load social network data where each user has an unknown and potentially large number of friends, stored as a text file of the following format:
UserId: FriendId1, FriendId2, ...
1: 12, 33
2:
3: 4, 6, 10, 15, 16
into a two-column data.frame:
UserId FriendId
1 1 12
2 1 33
3 3 4
4 3 6
5 3 10
6 3 15
7 3 16
How would you do that in R?
Reading, filling and then reshaping is inefficient as it requires to keep in memory many columns full of NA.
Related questions here, and here.
If you really have a colon as a delimiter, then just use read.table with header = FALSE to get your data into R, then consider using cSplit from my "splitstackshape" package.
mydf <- read.table("test.txt", sep = ":", header = FALSE)
mydf
## V1 V2
## 1 1 12, 33
## 2 2
## 3 3 4, 6, 10, 15, 16
library(splitstackshape)
cSplit(mydf, "V2", ",", "long")
## V1 V2
## 1: 1 12
## 2: 1 33
## 3: 3 4
## 4: 3 6
## 5: 3 10
## 6: 3 15
## 7: 3 16
This reads the lines, then one-by-one parses them into two column matrices. This does produce character values (since lines of text are just characters) but it's trivial to coerce to numeric:
do.call(rbind, sapply(rLines, function(L) { n <- sub( ":.+", "", L);
items <- scan(text=sub(".+:","",L), sep=",");
matrix( c( rep(n, length(items)), items), ncol=2)}
)
)
#---------
[,1] [,2]
[1,] "1" "12"
[2,] "1" "33"
[3,] "3" "4"
[4,] "3" "6"
[5,] "3" "10"
[6,] "3" "15"
[7,] "3" "16"
If the path forward isn't trivial to you then educate yourself at ?as.numeric and ?as.data.frame.

R - Splitting a column text into 2 columns without delimiter

I need to manipulate the following data frame (data) so that the PATCH_CODE column is split into 2 resulting columns where the 1st column contains the letter of the string and the 2nd column contains the number as in the 2nd example dataframe below.
EDIT PATCH_CODE is not always 2 letters, occasional cases have a single letter in which case I need to force a 1 into the resulting code column.
initial data frame: head(data,4)
PATCH_CODE TERR PC1
A1 MENS_10 0.8629186
A3 MENS_10 -0.2703238
B1 MENS_10 0.9516067
B2 MENS_10 -0.1722446
resulting data frame:
PATCH CODE TERR PC1
A 1 MENS_10 0.8629186
A 3 MENS_10 -0.2703238
B 1 MENS_10 0.9516067
B 2 MENS_10 -0.1722446
I have seen examples of how to accomplish this when the column to be split has an identifiable text delimiter such as a comma by using colsplit in reshape but I have failed to find a solution for a structure like mine. Is this possible?
output of str(data)
'data.frame': 240 obs. of 3 variables:
$ PATCH_CODE: Factor w/ 42 levels "A","A1","A2",..: 2 3 4 7 8 12 13 16 17 18 ...
$ TERR : Factor w/ 19 levels "MENS_10","MENS_14",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PC1 : num 0.548 1.228 0.273 5.548 3.853 ...
You can use strsplit. Passing an empty string as a delimiter results in a split at each letter.
a <- c("A1", "B1", "C2", "D5", "R3")
strsplit(a, "")
[[1]]
[1] "A" "1"
[[2]]
[1] "B" "1"
[[3]]
[1] "C" "2"
[[4]]
[1] "D" "5"
[[5]]
[1] "R" "3"
If you want to put that in a matrix
> do.call(rbind, strsplit(a, ""))
[,1] [,2]
[1,] "A" "1"
[2,] "B" "1"
[3,] "C" "2"
[4,] "D" "5"
[5,] "R" "3"
By the sounds of your description, strsplit should work fine. If your data are a little more complicated, you can also look at a possible regex-based solution.
For this particular example, try:
do.call(rbind, strsplit(mydf$PATCH_CODE,
split = "(?<=[a-zA-Z])(?=[0-9])",
perl = TRUE))
# [,1] [,2]
# [1,] "A" "1"
# [2,] "A" "3"
# [3,] "B" "1"
# [4,] "B" "2"

Resources