how to select only integer values of a column [duplicate] - r

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 6 years ago.
my data have many columns with different names and want see all numeric values only in column name_id and store those values in z.
I want z should contains only numeric values of column name_id of data, if any alphabet is there in column then it should not get store in z.
z <- unique(data$name_id)
z
#[1] 10 11 12 13 14 3 4 5 6 7 8 9
#Levels: 10 11 12 13 14 3 4 5 6 7 8 9 a b c d e f
when i tried this
z <- unique(as.numeric(data$name_id))
z
# [1] 1 2 3 4 5 6 7 8 9 10 11 12
output contains values only till 12 but column has values greater than 12 also

Considering your data frame as
> b
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "Name" "Age"
Apply this :
regexp <- "[[:digit:]]+"
> z <- str_extract(b , regexp)
z[is.na(z)] <- ""
> z
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "" ""
Hope this helps .

Related

Convert pivot table generated from pivottabler package to dataframe

I'm trying to make a pivot table with pivottabler package. I want to convert the pivot table object to dataframe, so that I can convert it to data table (with DT) and render it in Shiny app, so that it's downloadable.
library(pivottabler)
pt = qpvt(mtcars, 'cyl', 'vs', 'n()')
I tried to convert it to matrix
as.data.frame(pt)
I got error message like below:
Error in as.data.frame.default(pt) : cannot coerce class ‘c("PivotTable", "R6")’ to a data.frame
Does anyone know how to convert the pivot table object to dataframe?
It is an R6 class. One option would be to extract with asDataFrame which can be revealed if we check the str
str(pt)
#...
#...
#asDataFrame: function (separator = " ", stringsAsFactors = default.stringsAsFactors())
#asJSON: function ()
#asList: function ()
#asMatrix: function (includeHeaders = TRUE, repeatHeaders = FALSE, rawValue = FALSE)
#asTidyDataFrame: function (includeGroupCaptions = TRUE, includeGroupValues = TRUE,
...
Therefore, applying asDataFrame() on the R6 object
out <- pt$asDataFrame()
out
# 0 1 Total
#4 1 10 11
#6 3 4 7
#8 14 NA 14
#Total 18 14 32
str(out)
#'data.frame': 4 obs. of 3 variables:
#$ 0 : int 1 3 14 18
#$ 1 : int 10 4 NA 14
#$ Total: int 11 7 14 32
or to get a matrix, asMatrix
pt$asMatrix()
# [,1] [,2] [,3] [,4]
#[1,] "" "0" "1" "Total"
#[2,] "4" "1" "10" "11"
#[3,] "6" "3" "4" "7"
#[4,] "8" "14" "" "14"
#[5,] "Total" "18" "14" "32"

Create a new data frame every time it encounters a value

I need to split the data frame based on certain condition, for example, I have a data framemy_df which has a variable k which has no negative values. I need to split this dataframe my_df every time it encounters 0. To interpret this more clearly below is my code to create my_df.
my_df <- data.frame("k" = c(0, 0,0, 0.1,1.3,4,5,7,8,11,14,17,10,5,0.4,0,0,0,1.0,2.3,5,7,3,0.1,0))
Upon executing the above code my dataframe is as shown below,
row_number k
1 0
2 0
3 0
4 0.1
5 1.3
6 4
7 5
8 7
9 8
10 11
11 14
12 17
13 10
14 5
15 0.4
16 0
17 0
18 0
19 1.0
20 2.3
21 5
22 7
23 3
24 0.1
25 0
My expected output is split the above data frame when the next value is zero.
i.e, a new dataframe df1 is created containing the values from row 1 to 15 similarly another data frame df2 is created containing values from row 16 -24, and another data frame df3 is created having values from row 25 this continues till the end of the data frame.
I found that split() does the job of splitting the data frame but I do not know how to implement my requirement in the function.
From data.table you can use the function rleidv() to create a grouping variable:
library("data.table")
my_df <- data.frame("k" = c(0, 0,0, 0.1,1.3,4,5,7,8,11,14,17,10,5,0.4,0,0,0,1.0,2.3,5,7,3,0.1,0))
split(my_df, (rleidv(my_df$k==0) - 1) %/% 2)
Here is a solution with base R:
r <- rle(my_df$k!=0)
r$values <- gl((length(r$values) + 1) %/% 2, k=2, length=length(r$values))
split(my_df, inverse.rle(r))
We can create a grouping variable with cumsum and diff, then split the 'my_df' based on it to have a list of data.frames
lst <- split(my_df, cumsum(c(TRUE, diff(!my_df$k) ==1)))
lapply(lst, row.names)
#$`1`
#[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
#$`2`
#[1] "16" "17" "18" "19" "20" "21" "22" "23" "24"
#$`3`
#[1] "25"
NOTE: No packages are used. Only base R methods are used.

Convert a string that contains numbers to a vector of numbers [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I want to convert a string that contains numbers into a vector with the same numbers. What I have done so far:
x <- "1234567890"
split <- unlist(strsplit(x,split = NULL))
split
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "0"
str(split)
chr [1:10] "1" "2" "3" "4" "5" "6" "7" "8" "9" "0"
So my split variable is still a string. I would like to end up with a vector:
vector <- c(1,2,3,4,5,6,7,8,9,0)
str(vector)
num [1:10] 1 2 3 4 5 6 7 8 9 0
How can I do that? Thank you
Use as.numeric like this:
x <- "1234567890"
split <- unlist(strsplit(x,split = NULL))
as.numeric(split)
[1] 1 2 3 4 5 6 7 8 9 0
Use as.numeric like
as.numeric(split)
#[1] 1 2 3 4 5 6 7 8 9 0
or
as.integer(split)
If we are using python, the option would be to wrap with list which returns a list of character
list('123456')
#['1', '2', '3', '4', '5', '6']
and for conversion to integer
list(map(int, '123456'))
#[1, 2, 3, 4, 5, 6]
I prefer to do it in one line
x <- "1234567890"
as.numeric(strsplit(as.character(x), "")[[1]])
Result
[1] 1 2 3 4 5 6 7 8 9 0
A more esoteric solution...
as.integer(charToRaw("1234567890"))-48
[1] 1 2 3 4 5 6 7 8 9 0

R Writing to data frame from inside for-loop

Brand new to R programming so please forgive me if I'm using wrong terminologies.
I'm trying to insert/append values to a data frame from inside a for-loop.
I can get the right values if I just print() them, but when I try to put it inside the data frame, I get mostly NA's. If I run this code it prints out the values I want.
output <- data.frame()
for (i in seq_along(Reasons)){
assign(paste(Reasons[i]), sum(ER$Reason == paste(Reasons[i])))
Tot <- get(paste(Reasons[i]))
assign(paste(Reasons[i],'ER',sep="_"), sum(grepl("ER|Er", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Er <- get(paste(Reasons[i],'ER',sep="_"))
assign(paste(Reasons[i],'adm',sep="_"), sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Adm <- get(paste(Reasons[i],'adm',sep="_"))
assign(paste(Reasons[i],'admrate',sep="_"), sprintf("%.0f%%", (sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))/(sum(ER$Reason == paste(Reasons[i])))*100))
Rate <- get(paste(Reasons[i],'admrate',sep="_"))
print(c(Er,Adm,Tot,Rate))
#clear variables just created
rm(list=ls(pattern=Reasons[i]))
rm(Tot,Er,Adm,Rate)
}
[1] "7" "13" "20" "65%"
[1] "4" "8" "12" "67%"
[1] "12" "12" "24" "50%"
[1] "23" "7" "30" "23%"
[1] "7" "1" "8" "12%"
[1] "3" "1" "4" "25%"
[1] "3" "0" "3" "0%"
[1] "6" "5" "11" "45%"
[1] "2" "9" "11" "82%"
[1] "2" "4" "6" "67%"
[1] "10" "4" "14" "29%"
[1] "5" "0" "5" "0%"
[1] "10" "4" "14" "29%"
[1] "0" "3" "3" "100%"
[1] "7" "3" "10" "30%"
[1] "0" "4" "4" "100%"
But when I use
output <- rbind(output, c(Er, Adm, Tot, Rate))
Instead of
print(c(Er,Adm,Tot,Rate))
I get the first row of values (7, 13, 20, 65%), then all NA's except the "7" in rows 5 and 15... What am I doing wrong?
Thank you in advance
As I don't know what your data look like I cannot reproduce your error. If I understand it correctly, for each value in Reasons you want to find (a) the total number of observations, (b) the number of observations with the string "Er" in the variable Disposition, (c) the number of observations with the string "Admi" in the variable Disposition and (d) the percentage of observations with the string "Admi" in the variable Disposition. If that is the case then you don't have to use assign and get to do this.
Here is a simpler way to do it (although it's not the best way to do it, see below):
## Here I just generated some data that might look like the data
## you are dealing with:
Reasons <- LETTERS[1:10]
ER <- data.frame(Reason = LETTERS[sample.int(10,100, replace = TRUE)],
Disposition = c("ER", "Admi", "SomethingElse")[sample.int(3,100, replace = TRUE)])
output <- data.frame()
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output <- rbind(output, c(Er, Adm, Tot, Rate))
}
> output
X.4. X.3. X.10. X.30...
1 4 3 10 30 %
2 2 3 6 50 %
3 2 1 6 17 %
4 5 2 14 14 %
5 3 5 11 45 %
6 2 4 11 36 %
7 3 6 14 43 %
8 2 2 5 40 %
9 1 7 11 64 %
10 4 4 12 33 %
Dynamically appending rows to a data frame or matrix is generally not a very good idea as it is quite memory intensive. If you know the dimensions of your matrix beforehand (as you do) you should initialize it with the right size and then fill the entries inside your loop:
## Initialize data:
output <- matrix(nrow = length(Reasons), ncol = 4)
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output[i,] <- c(Er, Adm, Tot, Rate)
}
There are, however, even simpler ways to do this kind of evaluation. You could e.g. use the dplyr package, where you can group the data by a variable (the different Values of ER$Reason in your case) and the evaluate the values you need:
## Load the package 'dplyr'
library(dplyr)
## Group the variable and evaluate:
output <- ER %>% group_by(Reason) %>%
dplyr::summarise(Er = sum(grepl("ER|Er", Disposition)),
Adm = sum(grepl("Admi|admi|ADMI|ADmi", Disposition)),
Tot = n(),
Rate = paste(round(Adm/Tot*100), "%"))
> output
# A tibble: 10 × 5
Reason Er Adm Tot Rate
<chr> <int> <int> <int> <chr>
1 A 4 3 10 30 %
2 B 2 3 6 50 %
3 C 2 1 6 17 %
4 D 5 2 14 14 %
5 E 3 5 11 45 %
6 F 2 4 11 36 %
7 G 3 6 14 43 %
8 H 2 2 5 40 %
9 I 1 7 11 64 %
10 J 4 4 12 33 %

convert character string into integer for modulo operation

I want to map md5 hashed character strings to weekday numbers (0-6) via modulo operation. Therefore I need to transform the character hashes into integers (numeric). I haven't found a way to output the hashes in byte form instead of ascii strings (via digest package). Any hints with base R or different approaches appreciated.
If you really want to do this, you'll require multiple-precision arithmetic, because a single md5 hash has 128 bits, which is too large to fit into a normal integer value. This can be done using the gmp package.
library('digest');
library('gmp');
as.integer(do.call(c,lapply(strsplit(sapply(letters,digest,'md5'),''), function(x) sum(as.bigz(match(x,c(0:9,letters[1:6]))-1)*as.bigz(16)^((length(x)-1):0)) ))%%7);
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
Let's break that down:
sapply(letters,digest,'md5')
## a b c ...
## "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1" ...
I wanted to design this algorithm to be vectorized, and decided to use the built-in letters vector as 26 arbitrary input values for demonstration purposes. Unfortunately the dream of a fully vectorized algorithm (i.e. with no hidden loops) was dashed right away, since digest() is not vectorized for some reason, which is why I had to use sapply() here to produce a vector of md5 hashes corresponding to the inputs.
strsplit(...,'')
## $a
## [1] "1" "2" "7" "a" "2" "e" "c" "0" "0" "9" "8" "9" "b" "9" "f" "7" "f" "a" "f" "6" "7" "1" "e" "d" "4" "7" "0" "b" "e" "7" "f" "8"
##
## $b
## [1] "d" "d" "f" "1" "0" "0" "6" "1" "2" "8" "0" "5" "3" "5" "9" "c" "d" "8" "1" "f" "d" "c" "5" "c" "e" "3" "b" "9" "f" "b" "b" "a"
##
## $c
## [1] "6" "e" "7" "a" "8" "c" "1" "c" "0" "9" "8" "e" "8" "8" "1" "7" "e" "3" "d" "f" "3" "f" "d" "1" "b" "2" "1" "1" "4" "9" "d" "1"
## ...
Splits the hashes into character vectors, each element being one hex digit of the hash. We now have a list of 26 character vectors.
lapply(..., function(x) ... )
Process each character vector one at a time. Diving into the function (example output will be given for the value of x corresponding to input string 'a'):
match(x,c(0:9,letters[1:6]))-1
## [1] 1 2 7 10 2 14 12 0 0 9 8 9 11 9 15 7 15 10 15 6 7 1 14 13 4 7 0 11 14 7 15 8
This returns the value of each digit as a plain old integer, by finding the index within the hex digit sequence (c(0:9,letters[1:6])) and subtracting one.
as.bigz(...)
## Big Integer ('bigz') object of length 32:
## [1] 1 2 7 10 2 14 12 0 0 9 8 9 11 9 15 7 15 10 15 6 7 1 14 13 4 7 0 11 14 7 15 8
Cast to big integer, required for the arithmetic we're about to do.
...*as.bigz(16)^((length(x)-1):0)
## Big Integer ('bigz') object of length 32:
## [1] 21267647932558653966460912964485513216 2658455991569831745807614120560689152 581537248155900694395415588872650752 51922968585348276285304963292200960 649037107316853453566312041152512
## [6] 283953734451123385935261518004224 15211807202738752817960438464512 0 0 2785365088392105618523029504
## [11] 154742504910672534362390528 10880332376531662572355584 831136500985057557610496 42501298345826806923264 4427218577690292387840
## [16] 129127208515966861312 17293822569102704640 720575940379279360 67553994410557440 1688849860263936
## [21] 123145302310912 1099511627776 962072674304 55834574848 1073741824
## [26] 117440512 0 720896 57344 1792
## [31] 240 8
Treating the hash as a big-endian hex number, multiply each digit value by its place value.
sum(...)
## Big Integer ('bigz') :
## [1] 24560512346470571536449760694956189688
Add up each place-value-weighted digit value to get the bigz representation of the hash.
This completes the lapply() function. Thus, coming out of the lapply() call is a list of bigz values corresponding to the hashes:
lapply(..., function(x) ... )
## $a
## Big Integer ('bigz') :
## [1] 24560512346470571536449760694956189688
##
## $b
## Big Integer ('bigz') :
## [1] 295010738308890763454498908323798711226
##
## $c
## Big Integer ('bigz') :
## [1] 146851381511772731860674382282097773009
## ...
do.call(c,...)
## Big Integer ('bigz') object of length 26:
## [1] 24560512346470571536449760694956189688 295010738308890763454498908323798711226 146851381511772731860674382282097773009 277896596675540352347406615789605003835 196274166648971101707441276945175337351
## [6] 152164057440943545205375583549802787690 177176961461451259509149953911555923867 104722841650969351697149582356678916643 338417919426764038104581950237023359466 337938589168387959049175020406476846763
## [11] 182882473465429367490220828342074920857 80661780033646501757972845962914093977 251563583963884775614900275564391350478 279860001817578054753205218523665183571 158142488666995307556311659134646734337
## [16] 116423801372716526262639744414150237351 97172586736798383425273805088952414146 316382305028166656556246910315962582893 245775506345085992020540282526076959865 96713787940004003047734284080139522561
## [21] 227309401343419671779216095382349119699 250431221767618781785406207793096585421 33680856367414392588062933086110875192 119974848773126933055729663395967301868 296965764652868210844163281547943654188
## [26] 118199003122415992890118393158735259681
This "unlists" the list. Note: I tried sapply() instead of lapply(), and alternatively unlist(), and neither worked. This is probably related to the bigz class, possibly to the fact that a vector of bigz values is actually weirdly encoded as a single vector of raw.
...%%7
## Big Integer ('bigz') object of length 26:
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
And finally we can take the modulus on 7.
as.integer(...)
## [1] 3 2 1 1 5 5 5 5 1 4 4 6 5 3 5 4 0 2 0 4 5 4 6 3 6 1
Last step is to convert back to plain old integer from bigz.

Resources