I need to split the data frame based on certain condition, for example, I have a data framemy_df which has a variable k which has no negative values. I need to split this dataframe my_df every time it encounters 0. To interpret this more clearly below is my code to create my_df.
my_df <- data.frame("k" = c(0, 0,0, 0.1,1.3,4,5,7,8,11,14,17,10,5,0.4,0,0,0,1.0,2.3,5,7,3,0.1,0))
Upon executing the above code my dataframe is as shown below,
row_number k
1 0
2 0
3 0
4 0.1
5 1.3
6 4
7 5
8 7
9 8
10 11
11 14
12 17
13 10
14 5
15 0.4
16 0
17 0
18 0
19 1.0
20 2.3
21 5
22 7
23 3
24 0.1
25 0
My expected output is split the above data frame when the next value is zero.
i.e, a new dataframe df1 is created containing the values from row 1 to 15 similarly another data frame df2 is created containing values from row 16 -24, and another data frame df3 is created having values from row 25 this continues till the end of the data frame.
I found that split() does the job of splitting the data frame but I do not know how to implement my requirement in the function.
From data.table you can use the function rleidv() to create a grouping variable:
library("data.table")
my_df <- data.frame("k" = c(0, 0,0, 0.1,1.3,4,5,7,8,11,14,17,10,5,0.4,0,0,0,1.0,2.3,5,7,3,0.1,0))
split(my_df, (rleidv(my_df$k==0) - 1) %/% 2)
Here is a solution with base R:
r <- rle(my_df$k!=0)
r$values <- gl((length(r$values) + 1) %/% 2, k=2, length=length(r$values))
split(my_df, inverse.rle(r))
We can create a grouping variable with cumsum and diff, then split the 'my_df' based on it to have a list of data.frames
lst <- split(my_df, cumsum(c(TRUE, diff(!my_df$k) ==1)))
lapply(lst, row.names)
#$`1`
#[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
#$`2`
#[1] "16" "17" "18" "19" "20" "21" "22" "23" "24"
#$`3`
#[1] "25"
NOTE: No packages are used. Only base R methods are used.
Related
The codes for producing sample dataset and converting from character to numeric is as below:
ff = data.frame(a = c('1','2','3'),b = 1:3, c = 5:7)
#data.frame is a type of list.
fff = list(ff,ff,ff,ff)
k = fff %>% map(~map(.x,function(x){x['a'] %<>% as.numeric
return(x)}))
However, the result is something like this...:
There are 3 lists appear in each of the nested list ==> 33 = 9, which is very strange.
I think the result should have 3 lists in a nested list.==> 31 = 3
what I want is to convert every a in each dataframe to be numeric.
> k
[[1]]
[[1]]$a
a
"1" "2" "3" NA
[[1]]$b
a
1 2 3 NA
[[1]]$c
a
5 6 7 NA
[[2]]
[[2]]$a
a
"1" "2" "3" NA
[[2]]$b
a
1 2 3 NA
[[2]]$c
a
5 6 7 NA
[[3]]
[[3]]$a
a
"1" "2" "3" NA
[[3]]$b
a
1 2 3 NA
[[3]]$c
a
5 6 7 NA
[[4]]
[[4]]$a
a
"1" "2" "3" NA
[[4]]$b
a
1 2 3 NA
[[4]]$c
a
5 6 7 NA
I cannot understand why I cannot convert a into numeric...
Like this, with mutate:
fff %>%
map(~ mutate(.x, a = as.numeric(a)))
Or, more base R style:
fff %>%
map(\(x) {x$a <- as.numeric(x$a); x})
You should use map only once, because you don't have a nested list. With the first map, you access to each dataframe, and then you can convert to numeric. With a second map, you are accessing the columns of each data frame (which you don't want).
With two maps, it's also preferable to use \ or function rather than ~ because it becomes confusing to use .x and x for different objects. In your question, .x is the dataframe, while x are columns of it.
I have been wrestling with some code for a personal project and have been hitting some roadblocks.
I have some restaurant data and there is a column for the table with information separated by "/".
For example : 4/1 means table 4, and first check at that table for the day. 10/A/2 means Table 10, the check was split into 2 or more checks (A, B, C, etc) and this is check 10/A, and turnover 2.
Checks can also be togo orders which may be denoted by the name of the order.
For example, here are some possible orders:
1/1
1/2
10/A/3
10/B/3
Togo
Bob Togo
And I want to split them into 1 to 3 columns that are organized by table (or Togo), split, and turnover. Like so:
> check <- c("1/1", "1/2", "10/A/3", "10/B/3", "Togo", "Bob Togo")
> checknum <- seq(1:6)
> dat <- cbind(checknum,check)
> dat
checknum check
[1,] "1" "1/1"
[2,] "2" "1/2"
[3,] "3" "10/A/3"
[4,] "4" "10/B/3"
[5,] "5" "Togo"
[6,] "6" "Bob Togo"
And Ideally I want them to look like this:
> Table <- c(1,1,10,10,"Togo","Bob Togo")
> Split <- c(NA,NA,"A","B",NA,NA)
> Turn <- c(1,2,3,3,NA,NA)
> Ideal <- cbind(checknum,Table,Split,Turn)
> Ideal
checknum Table Split Turn
[1,] "1" "1" NA "1"
[2,] "2" "1" NA "2"
[3,] "3" "10" "A" "3"
[4,] "4" "10" "B" "3"
[5,] "5" "Togo" NA NA
[6,] "6" "Bob Togo" NA NA
Where all columns are for a specific aspect of the check with NAs for missing values.
Numeric values can be left as factors because each acts as a factor more than an integer. Ideally, the "Bob Togo" would be renamed "Togo" as well so that all Togo orders share the same factor.
I know this is a bit at once, but I've been hitting roadblocks for over 2 weeks now and I feel I'm missing something simple.
I'm relatively new to R, so any addition explanation with your answer is greatly appreciated.
We can do this with tidyverse by mutateing the 'check' column using str_replace and then separate the 'check' into three columns
library(tidyverse)
dat %>%
mutate(check = str_replace(check, "^(\\d+)/(\\d+)$", "\\1/NA/\\2")) %>%
separate( check, into = c("Table", "Split", "Turn"), sep="/", convert = TRUE)
# checknum Table Split Turn
#1 1 1 NA 1
#2 2 1 NA 2
#3 3 10 A 3
#4 4 10 B 3
#5 5 Togo <NA> <NA>
#6 6 Bob Togo <NA> <NA>
NOTE 1: It is better to create a data.frame as initial dataset than a matrix to accommodeate different class of columns
NOTE 2: tidyverse is a collection of packages. So, when load, it loads all the packages coming from that bundle. As #mt1022 suggested, we don't need to load the whole tidyverse, instead can load dplyr (mutate), tidyr (separate) and stringr (str_replace).
data
dat <- data.frame(checknum,check, stringsAsFactors=FALSE)
Brand new to R programming so please forgive me if I'm using wrong terminologies.
I'm trying to insert/append values to a data frame from inside a for-loop.
I can get the right values if I just print() them, but when I try to put it inside the data frame, I get mostly NA's. If I run this code it prints out the values I want.
output <- data.frame()
for (i in seq_along(Reasons)){
assign(paste(Reasons[i]), sum(ER$Reason == paste(Reasons[i])))
Tot <- get(paste(Reasons[i]))
assign(paste(Reasons[i],'ER',sep="_"), sum(grepl("ER|Er", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Er <- get(paste(Reasons[i],'ER',sep="_"))
assign(paste(Reasons[i],'adm',sep="_"), sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))
Adm <- get(paste(Reasons[i],'adm',sep="_"))
assign(paste(Reasons[i],'admrate',sep="_"), sprintf("%.0f%%", (sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & ER$Reason == paste(Reasons[i])))/(sum(ER$Reason == paste(Reasons[i])))*100))
Rate <- get(paste(Reasons[i],'admrate',sep="_"))
print(c(Er,Adm,Tot,Rate))
#clear variables just created
rm(list=ls(pattern=Reasons[i]))
rm(Tot,Er,Adm,Rate)
}
[1] "7" "13" "20" "65%"
[1] "4" "8" "12" "67%"
[1] "12" "12" "24" "50%"
[1] "23" "7" "30" "23%"
[1] "7" "1" "8" "12%"
[1] "3" "1" "4" "25%"
[1] "3" "0" "3" "0%"
[1] "6" "5" "11" "45%"
[1] "2" "9" "11" "82%"
[1] "2" "4" "6" "67%"
[1] "10" "4" "14" "29%"
[1] "5" "0" "5" "0%"
[1] "10" "4" "14" "29%"
[1] "0" "3" "3" "100%"
[1] "7" "3" "10" "30%"
[1] "0" "4" "4" "100%"
But when I use
output <- rbind(output, c(Er, Adm, Tot, Rate))
Instead of
print(c(Er,Adm,Tot,Rate))
I get the first row of values (7, 13, 20, 65%), then all NA's except the "7" in rows 5 and 15... What am I doing wrong?
Thank you in advance
As I don't know what your data look like I cannot reproduce your error. If I understand it correctly, for each value in Reasons you want to find (a) the total number of observations, (b) the number of observations with the string "Er" in the variable Disposition, (c) the number of observations with the string "Admi" in the variable Disposition and (d) the percentage of observations with the string "Admi" in the variable Disposition. If that is the case then you don't have to use assign and get to do this.
Here is a simpler way to do it (although it's not the best way to do it, see below):
## Here I just generated some data that might look like the data
## you are dealing with:
Reasons <- LETTERS[1:10]
ER <- data.frame(Reason = LETTERS[sample.int(10,100, replace = TRUE)],
Disposition = c("ER", "Admi", "SomethingElse")[sample.int(3,100, replace = TRUE)])
output <- data.frame()
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output <- rbind(output, c(Er, Adm, Tot, Rate))
}
> output
X.4. X.3. X.10. X.30...
1 4 3 10 30 %
2 2 3 6 50 %
3 2 1 6 17 %
4 5 2 14 14 %
5 3 5 11 45 %
6 2 4 11 36 %
7 3 6 14 43 %
8 2 2 5 40 %
9 1 7 11 64 %
10 4 4 12 33 %
Dynamically appending rows to a data frame or matrix is generally not a very good idea as it is quite memory intensive. If you know the dimensions of your matrix beforehand (as you do) you should initialize it with the right size and then fill the entries inside your loop:
## Initialize data:
output <- matrix(nrow = length(Reasons), ncol = 4)
for (i in seq(along = Reasons)){
Tot <- sum(ER$Reason ==Reasons[i])
Er <- sum(grepl("ER|Er", ER$Disposition) & (ER$Reason ==Reasons[i]))
Adm <- sum(grepl("Admi|admi|ADMI|ADmi", ER$Disposition) & (ER$Reason ==Reasons[i]))
Rate <- paste(round(Adm/Tot*100), "%")
output[i,] <- c(Er, Adm, Tot, Rate)
}
There are, however, even simpler ways to do this kind of evaluation. You could e.g. use the dplyr package, where you can group the data by a variable (the different Values of ER$Reason in your case) and the evaluate the values you need:
## Load the package 'dplyr'
library(dplyr)
## Group the variable and evaluate:
output <- ER %>% group_by(Reason) %>%
dplyr::summarise(Er = sum(grepl("ER|Er", Disposition)),
Adm = sum(grepl("Admi|admi|ADMI|ADmi", Disposition)),
Tot = n(),
Rate = paste(round(Adm/Tot*100), "%"))
> output
# A tibble: 10 × 5
Reason Er Adm Tot Rate
<chr> <int> <int> <int> <chr>
1 A 4 3 10 30 %
2 B 2 3 6 50 %
3 C 2 1 6 17 %
4 D 5 2 14 14 %
5 E 3 5 11 45 %
6 F 2 4 11 36 %
7 G 3 6 14 43 %
8 H 2 2 5 40 %
9 I 1 7 11 64 %
10 J 4 4 12 33 %
I have a vector of n observations. Now I need to create all the possible combinations with those n elements. For example, my vector is
a<-1:4
In my output, combinations should be like,
1
2
3
4
12
13
14
23
24
34
123
124
134
234
1234
How can I get this output?
Thanks in advance.
Something like this could work:
unlist(sapply(1:4, function(x) apply(combn(1:4, x), 2, paste, collapse = '')))
First we get the combinations using combn and then we paste the outputs together. Finally, unlist gives us a vector with the output we need.
Output:
[1] "1" "2" "3" "4" "12" "13" "14" "23" "24" "34" "123" "124"
"134" "234" "1234"
This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 6 years ago.
my data have many columns with different names and want see all numeric values only in column name_id and store those values in z.
I want z should contains only numeric values of column name_id of data, if any alphabet is there in column then it should not get store in z.
z <- unique(data$name_id)
z
#[1] 10 11 12 13 14 3 4 5 6 7 8 9
#Levels: 10 11 12 13 14 3 4 5 6 7 8 9 a b c d e f
when i tried this
z <- unique(as.numeric(data$name_id))
z
# [1] 1 2 3 4 5 6 7 8 9 10 11 12
output contains values only till 12 but column has values greater than 12 also
Considering your data frame as
> b
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "Name" "Age"
Apply this :
regexp <- "[[:digit:]]+"
> z <- str_extract(b , regexp)
z[is.na(z)] <- ""
> z
[1] "1" "2" "3" "4" "5" "13" "14" "15" "45" "567" "999" "" ""
Hope this helps .