Using do() with names of list elements - r

I am trying to take the names of list elements and use do() to apply a function over them all, then bind them in a single data frame.
require(XML)
require(magrittr)
url <- "http://gd2.mlb.com/components/game/mlb/year_2016/month_05/day_21/gid_2016_05_21_milmlb_nynmlb_1/boxscore.xml"
box <- xmlParse(url)
xml_data <- xmlToList(box)
end <- length(xml_data[[2]]) - 1
x <- seq(1:end)
away_pitchers_names <- paste0("xml_data[[2]][", x, "]")
away_pitchers_names <- as.data.frame(away_pitchers_names)
names(away_pitchers_names) <- "elements"
away_pitchers_names$elements %<>% as.character()
listTodf <- function(x) {
df <- as.data.frame(x)
tdf <- as.data.frame(t(df))
row.names(tdf) <- NULL
tdf
}
test <- away_pitchers_names %>% group_by(elements) %>% do(listTodf(.$elements))
When I run the listTodf function on a list element it works fine:
listTodf(xml_data[[2]][1]
id name name_display_first_last pos out bf er r h so hr bb np s w l sv bs hld s_ip s_h s_r s_er s_bb
1 605200 Davies Zach Davies P 16 22 4 4 5 5 2 2 86 51 1 3 0 0 0 36.0 41 24 23 15
s_so game_score era
1 25 45 5.75
But when I try to loop through the names of the elements with the do() function I get the following:
Warning message:
In rbind_all(out[[1]]) : Unequal factor levels: coercing to character
And here is the output:
> test
Source: local data frame [5 x 2]
Groups: elements [5]
elements V1
(chr) (chr)
1 xml_data[[2]][1] xml_data[[2]][1]
2 xml_data[[2]][2] xml_data[[2]][2]
3 xml_data[[2]][3] xml_data[[2]][3]
4 xml_data[[2]][4] xml_data[[2]][4]
5 xml_data[[2]][5] xml_data[[2]][5]
I am sure it is something extremely simple, but I can't figure out where things are getting tripped up.

For evaluating the strings, eval(parse can be used
library(dplyr)
lapply(away_pitchers_names$elements,
function(x) as.data.frame.list(eval(parse(text=x))[[1]], stringsAsFactors=FALSE)) %>%
bind_rows()
# id name name_display_first_last pos out bf er r h so hr bb np s w l
#1 605200 Davies Zach Davies P 16 22 4 4 5 5 2 2 86 51 1 3
#2 430641 Boyer Blaine Boyer P 2 4 0 0 2 0 0 0 8 7 1 0
#3 448614 Torres, C Carlos Torres P 3 4 0 0 0 1 0 2 21 11 0 1
#4 592804 Thornburg Tyler Thornburg P 3 3 0 0 0 1 0 0 14 8 2 1
#5 518468 Blazek Michael Blazek P 1 5 1 1 2 0 0 2 23 10 1 1
# sv bs hld s_ip s_h s_r s_er s_bb s_so game_score era loss note
#1 0 0 0 36.0 41 24 23 15 25 45 5.75 <NA> <NA>
#2 0 1 0 21.1 22 4 4 5 7 48 1.69 <NA> <NA>
#3 0 0 2 22.1 22 9 9 14 21 52 3.63 <NA> <NA>
#4 1 2 8 18.2 13 8 8 7 29 54 3.86 <NA> <NA>
#5 0 1 8 21.1 23 6 6 14 18 41 2.53 true (L, 1-1)
However, it is easier and faster to just do
lapply(xml_data[[2]][1:5], function(x)
as.data.frame.list(x, stringsAsFactors=FALSE)) %>%
bind_rows()

Related

find first occurrence in two variables in df

I need to find the first two times my df meets a certain condition grouped by two variables. I am trying to use the ddply function, but I am doing something wrong with the ".variables" command.
So in this example, I'm trying to find the first two times x > 30 and y > 30 in each group / trial.
The way I'm using ddply is giving me the first two times in the dataset, then repeating that for every group.
set.seed(1)
df <- data.frame((matrix(nrow=200,ncol=5)))
colnames(df) <- c("group","trial","x","y","hour")
df$group <- rep(c("A","B","C","D"),each=50)
df$trial <- rep(c(rep(1,times=25),rep(2,times=25)),times=4)
df[,3:4] <- runif(400,0,50)
df$hour <- rep(1:25,time=8)
library(plyr)
ddply(.data=df, .variables=c("group","trial"), .fun=function(x) {
i <- which(df$x > 30 & df$y >30 )[1:2]
if (!is.na(i)) x[i, ]
})
Expected results:
group trial x y hour
13 A 1 34.3511423 38.161134 13
15 A 1 38.4920710 40.931734 15
36 A 2 33.4233369 34.481392 11
37 A 2 39.7119930 34.470671 12
52 B 1 43.0604738 46.645491 2
65 B 1 32.5435234 35.123126 15
But instead, my code is finding c(1,4) from the first grouptrial and repeating that over for every grouptrial:
group trial x y hour
1 A 1 34.351142 38.161134 13
2 A 1 38.492071 40.931734 15
3 A 2 5.397181 27.745031 13
4 A 2 20.563721 22.636003 15
5 B 1 22.953286 13.898301 13
6 B 1 32.543523 35.123126 15
I would also like for there to be rows of NA if a second occurrence isn't present in a group*trial.
Thanks,
I think this is what you want:
library(tidyverse)
df %>% group_by(group, trial) %>% filter(x > 30 & y > 30) %>% slice(1:2)
Result:
# A tibble: 16 x 5
# Groups: group, trial [8]
group trial x y hour
<chr> <dbl> <dbl> <dbl> <int>
1 A 1 33.5 46.3 4
2 A 1 32.6 42.7 11
3 A 2 35.9 43.6 4
4 A 2 30.5 42.7 14
5 B 1 33.0 38.1 2
6 B 1 40.5 30.4 7
7 B 2 48.6 33.2 2
8 B 2 34.1 30.9 4
9 C 1 33.0 45.1 1
10 C 1 30.3 36.7 17
11 C 2 44.8 33.9 1
12 C 2 41.5 35.6 6
13 D 1 44.2 34.3 12
14 D 1 39.1 40.0 23
15 D 2 39.4 47.5 4
16 D 2 42.1 40.1 10
(slightly different from your results, probably a different R version)
I reccomend using dplyr or data.table rather than plyr. From the plyr github page:
plyr is retired: this means only changes necessary to keep it on CRAN
will be made. We recommend using dplyr (for data frames) or purrr (for
lists) instead.
Since someone has already provided a solution with dplyr, here is one option with data.table.
In the selection df[i, j, k] I am selecting rows which match your criteria in i, grouping by the given variables in k, and selecting the first two rows (head) of each group-specific subset of the data .SD. All of this inside the brackets is data.table specific, and only works because I converted df to a data.table first with setDT.
library(data.table)
setDT(df)
df[x > 30 & y > 30, head(.SD, 2), by = .(group, trial)]
# group trial x y hour
# 1: A 1 34.35114 38.16113 13
# 2: A 1 38.49207 40.93173 15
# 3: A 2 33.42334 34.48139 11
# 4: A 2 39.71199 34.47067 12
# 5: B 1 43.06047 46.64549 2
# 6: B 1 32.54352 35.12313 15
# 7: B 2 48.03090 38.53685 5
# 8: B 2 32.11441 49.07817 18
# 9: C 1 32.73620 33.68561 1
# 10: C 1 32.00505 31.23571 20
# 11: C 2 32.13977 40.60658 9
# 12: C 2 34.13940 49.47499 16
# 13: D 1 36.18630 34.94123 19
# 14: D 1 42.80658 46.42416 23
# 15: D 2 37.05393 43.24038 3
# 16: D 2 44.32255 32.80812 8
To try a solution that is closer to what you've tried so far we can do the following
ddply(.data=df, .variables=c("group","trial"), .fun=function(df_temp) {
i <- which(df_temp$x > 30 & df_temp$y >30 )[1:2]
df_temp[i, ]
})
Some explanation
One problem with the code that you provided is that you used df inside of ddply. So you defined fun= function(x) but you didn't look for cases of x> 30 & y> 30 in x but in df. Further, your code uses i for x, but i was defined with df. Finally, to my understanding there is no need for if (!is.na(i)) x[i, ]. If there is only one row that meets your condition, you will get a row with NAs anayway, because you use which(df_temp$x > 30 & df_temp$y >30 )[1:2].
Using dplyr, you can also do:
df %>%
group_by(group, trial) %>%
slice(which(x > 30 & y > 30)[1:2])
group trial x y hour
<chr> <dbl> <dbl> <dbl> <int>
1 A 1 34.4 38.2 13
2 A 1 38.5 40.9 15
3 A 2 33.4 34.5 11
4 A 2 39.7 34.5 12
5 B 1 43.1 46.6 2
6 B 1 32.5 35.1 15
7 B 2 48.0 38.5 5
8 B 2 32.1 49.1 18
Since everything else is covered here is a base R version using split
output <- do.call(rbind, lapply(split(df, list(df$group, df$trial)),
function(new_df) new_df[with(new_df, head(which(x > 30 & y > 30), 2)), ]
))
rownames(output) <- NULL
output
# group trial x y hour
#1 A 1 34.351 38.161 13
#2 A 1 38.492 40.932 15
#3 B 1 43.060 46.645 2
#4 B 1 32.544 35.123 15
#5 C 1 32.736 33.686 1
#6 C 1 32.005 31.236 20
#7 D 1 36.186 34.941 19
#8 D 1 42.807 46.424 23
#9 A 2 33.423 34.481 11
#10 A 2 39.712 34.471 12
#11 B 2 48.031 38.537 5
#12 B 2 32.114 49.078 18
#13 C 2 32.140 40.607 9
#14 C 2 34.139 49.475 16
#15 D 2 37.054 43.240 3
#16 D 2 44.323 32.808 8

R - Issues while calling a user-defined function

I have the following dataframe named "dataset"
> dataset
V1 V2 V3 V4 V5 V6 V7
1 A 29 27 0 14 21 163
2 W 70 40 93 63 44 1837
3 E 11 1 11 49 17 315
4 S 20 59 36 23 14 621
5 C 12 7 48 24 25 706
6 B 14 8 78 27 17 375
7 G 12 7 8 4 4 257
8 T 0 0 0 0 0 0
9 N 32 6 9 14 17 264
10 R 28 46 49 55 38 608
11 O 12 2 8 12 11 450
I have two helper functions as below
get_A <- function(p){
return(data.frame(Scorecard = p,
Results = dataset[nrow(dataset),(p+1)]))
} #Pulls the value from the last row for a given value of (p and offset by 1)
get_P <- function(p){
return(data.frame(Scorecard= p,
Results = dataset[p,ncol(dataset)]))
} #Pulls the value from the last column for a given value of p
I have the following dataframe on which I need to run the above helper functions. There will be NAs because I'm reading this "data_sub" dataframe from an excel file which can have unequal rows for the two columns.
> data_sub
Key_P Key_A
1 2 1
2 3 3
3 4 5
4 NA NA
When I call the helper functions, I get some strange results as shown below:
> get_P(data_sub[complete.cases(data_sub$Key_P),]$Key_P)
Scorecard Results
1 2 1837
2 3 315
3 4 621
> get_A(data_sub[complete.cases(data_sub$Key_A),]$Key_A)
Scorecard Results.V2 Results.V4 Results.V6
1 1 12 8 11
2 3 12 8 11
3 5 12 8 11
Warning message:
In data.frame(Scorecard = p, Results = dataset[nrow(dataset), (p + :
row names were found from a short variable and have been discarded
The call to the get_P() helper function is working the way I want. I'm getting the "Results" for each non-NA value in data_sub$Key_P as a dataframe.
But the call to the get_A() helper function is giving strange results and also a warning.I was expecting it to give a similar dataframe as given the call to get_P(). Why is this happening and how can I make get_A() to give the correct dataframe? Basically, the output of this should be
Scorecard Results
1 1 12
2 3 8
3 5 11
I found this link related to the warning but it's unhelpful in solving my issue.
The following works
get_P <- function(df, data_sub) {
data_sub <- data_sub[complete.cases(data_sub), ]
data.frame(
Scorecard = data_sub$Key_P,
Results = df[data_sub$Key_P, ncol(df)])
}
get_P(df, data_sub)
# Scorecard Results
#1 2 1837
#2 3 315
#3 4 621
get_A <- function(df, data_sub) {
data_sub <- data_sub[complete.cases(data_sub), ];
data.frame(
Scorecard = data_sub$Key_A,
Results = as.numeric(df[nrow(df), data_sub$Key_A + 1]))
}
get_A(df, data_sub)
# Scorecard Results
#1 1 12
#2 3 8
#3 5 11
To avoid the warning, we need to strip rownames with as.numeric in get_A.
Another tip: It's better coding practice to make get_P and get_A a function of both df and data_sub to avoid global variables.
Sample data
df <- read.table(text =
" V1 V2 V3 V4 V5 V6 V7
1 A 29 27 0 14 21 163
2 W 70 40 93 63 44 1837
3 E 11 1 11 49 17 315
4 S 20 59 36 23 14 621
5 C 12 7 48 24 25 706
6 B 14 8 78 27 17 375
7 G 12 7 8 4 4 257
8 T 0 0 0 0 0 0
9 N 32 6 9 14 17 264
10 R 28 46 49 55 38 608
11 O 12 2 8 12 11 450", header = T, row.names = 1)
data_sub <- read.table(text =
" Key_P Key_A
1 2 1
2 3 3
3 4 5
4 NA NA", header = T, row.names = 1)

How to insert a row which calculates the average of the rows above it?

I was looking to separate rows of data by Cue and adding a row which calculate averages per subject. Here is an example:
Before:
Cue ITI a b c
1 0 16 0.82062 0.52185 0.27679
2 0 24 0.53894 0.49957 0.35767
3 4 22 0.26855 0.17487 0.22461
4 4 20 0.15106 0.48767 0.49072
5 7 18 0.11627 0.12604 0.2832
6 7 24 0.50201 0.14252 0.21454
7 12 16 0.27649 0.96008 0.42114
8 12 18 0.60852 0.21637 0.18799
9 22 20 0.32867 0.65308 0.29388
10 22 24 0.25726 0.37048 0.32379
After:
Cue ITI a b c
1 0 16 0.82062 0.52185 0.27679
2 0 24 0.53894 0.49957 0.35767
3 0.67978 0.51071 0.31723
4 4 22 0.26855 0.17487 0.22461
5 4 20 0.15106 0.48767 0.49072
6 0.209 0.331 0.357
7 7 18 0.11627 0.12604 0.2832
8 7 24 0.50201 0.14252 0.21454
9 0.309 0.134 0.248
10 12 16 0.27649 0.96008 0.42114
11 12 18 0.60852 0.21637 0.18799
12 0.442 0.588 0.304
13 22 20 0.32867 0.65308 0.29388
14 22 24 0.25726 0.37048 0.32379
15 0.292 0.511 0.308
So in the "after" example, line 3 is the average of lines 1 and 2 (line 6 is the average of lines 4 and 5, etc...).
Any help/information would be greatly appreciated!
Thank you!
You can use base r to do something like:
Reduce(rbind,by(data,data[1],function(x)rbind(x,c(NA,NA,colMeans(x[-(1:2)])))))
Cue ITI a b c
1 0 16 0.820620 0.521850 0.276790
2 0 24 0.538940 0.499570 0.357670
3 NA NA 0.679780 0.510710 0.317230
32 4 22 0.268550 0.174870 0.224610
4 4 20 0.151060 0.487670 0.490720
31 NA NA 0.209805 0.331270 0.357665
5 7 18 0.116270 0.126040 0.283200
6 7 24 0.502010 0.142520 0.214540
33 NA NA 0.309140 0.134280 0.248870
7 12 16 0.276490 0.960080 0.421140
8 12 18 0.608520 0.216370 0.187990
34 NA NA 0.442505 0.588225 0.304565
9 22 20 0.328670 0.653080 0.293880
10 22 24 0.257260 0.370480 0.323790
35 NA NA 0.292965 0.511780 0.308835
Here is one idea. Split the data frame, perform the analysis, and then combine them together.
DF_list <- split(DF, f = DF$Cue)
DF_list2 <- lapply(DF_list, function(x){
df_temp <- as.data.frame(t(colMeans(x[, -c(1, 2)])))
df_temp[, c("Cue", "ITI")] <- NA
df <- rbind(x, df_temp)
return(df)
})
DF2 <- do.call(rbind, DF_list2)
rownames(DF2) <- 1:nrow(DF2)
DF2
# Cue ITI a b c
# 1 0 16 0.820620 0.521850 0.276790
# 2 0 24 0.538940 0.499570 0.357670
# 3 NA NA 0.679780 0.510710 0.317230
# 4 4 22 0.268550 0.174870 0.224610
# 5 4 20 0.151060 0.487670 0.490720
# 6 NA NA 0.209805 0.331270 0.357665
# 7 7 18 0.116270 0.126040 0.283200
# 8 7 24 0.502010 0.142520 0.214540
# 9 NA NA 0.309140 0.134280 0.248870
# 10 12 16 0.276490 0.960080 0.421140
# 11 12 18 0.608520 0.216370 0.187990
# 12 NA NA 0.442505 0.588225 0.304565
# 13 22 20 0.328670 0.653080 0.293880
# 14 22 24 0.257260 0.370480 0.323790
# 15 NA NA 0.292965 0.511780 0.308835
DATA
DF <- read.table(text = " Cue ITI a b c
1 0 16 0.82062 0.52185 0.27679
2 0 24 0.53894 0.49957 0.35767
3 4 22 0.26855 0.17487 0.22461
4 4 20 0.15106 0.48767 0.49072
5 7 18 0.11627 0.12604 0.2832
6 7 24 0.50201 0.14252 0.21454
7 12 16 0.27649 0.96008 0.42114
8 12 18 0.60852 0.21637 0.18799
9 22 20 0.32867 0.65308 0.29388
10 22 24 0.25726 0.37048 0.32379", header = TRUE)
A data.table approach, but if someone can offer some improvements I'd be keen to hear.
library(data.table)
dt <- data.table(df)
dt2 <- dt[, lapply(.SD, mean), by = Cue][,ITI := NA][]
data.table(rbind(dt, dt2))[order(Cue)][is.na(ITI), Cue := NA][]
> data.table(rbind(dt, dt2))[order(Cue)][is.na(ITI), Cue := NA][]
Cue ITI a b c
1: 0 16 0.820620 0.521850 0.276790
2: 0 24 0.538940 0.499570 0.357670
3: NA NA 0.679780 0.510710 0.317230
4: 4 22 0.268550 0.174870 0.224610
5: 4 20 0.151060 0.487670 0.490720
6: NA NA 0.209805 0.331270 0.357665
If you want to leave the Cue values as-is to confirm group, just drop the [is.na(ITI), Cue := NA] from the last line.
I would use group_by and summarise from the DPLYR package to get a dataframe with the average values. Then rbind the new data frame with the old one and sort by Cue:
df_averages <- df_orig >%>
group_by(Cue) >%>
summarise(ITI = NA, a = mean(a), b = mean(b), c = mean(c)) >%>
ungroup()
df_all <- rbind(df_orig, df_averages)

Using aggregate in a dataframe with NA without dropping rows [duplicate]

This question already has an answer here:
Blend of na.omit and na.pass using aggregate?
(1 answer)
Closed 5 years ago.
I am using aggregate to get the means of several variables by a specific category (cy), but there are a few NA's in my dataframe. I am using aggregate rather than ddply because from my understanding it takes care of NA's similarly to using rm.na=TRUE. The problem is that it drops all rows containing NA in the output, so the means are slightly off.
Dataframe:
> bt cy cl pf ne YH YI
1 1 H 1 95 70.0 20 20
2 2 H 1 25 70.0 46 50
3 1 H 1 0 70.0 40 45
4 2 H 1 95 59.9 40 40
5 2 H 1 75 59.9 36 57
6 2 H 1 5 70.0 35 43
7 1 H 1 50 59.9 20 36
8 2 H 1 95 59.9 40 42
9 3 H 1 95 49.5 17 48
10 2 H 1 5 70.0 42 42
11 2 H 1 95 49.5 19 30
12 3 H 1 25 49.5 33 51
13 1 H 1 75 49.5 5 26
14 1 H 1 5 70.0 35 37
15 1 H 1 5 59.9 20 40
16 2 H 1 95 49.5 29 53
17 2 H 1 75 70.0 41 41
18 2 H 1 0 70.0 10 10
19 2 H 1 95 49.5 25 32
20 1 H 1 95 59.9 10 11
21 2 H 1 0 29.5 20 28
22 1 H 1 95 29.5 11 27
23 2 H 1 25 59.9 26 26
24 1 H 1 5 70.0 30 30
25 3 H 1 25 29.5 20 30
26 3 H 1 50 70.0 5 5
27 1 H 1 0 59.9 3 10
28 1 K 1 5 49.5 25 29
29 2 K 1 0 49.5 30 32
30 1 K 1 95 49.5 13 24
31 1 K 1 0 39.5 13 13
32 2 M 1 NA 70.0 45 50
33 3 M 1 25 59.9 3 34'
The full dataframe has 74 rows, and there are NA's peppered throughout all but two columns (cy and cl).
My code looks like this:
meancnty<-(aggregate(cbind(pf,ne,YH,YI)~cy, data = newChart, FUN=mean))
I double checked in excel, and the means this function produces are for a dataset of N=69, after removing all rows containing NA's. Is there any way to tell R to ignore the NA's rather than remove the rows, other than taking the mean of each variable by county (I have a lot of variables to summarize by many different categories)?
Thank you
using dplyr
df %>%
group_by(cy) %>%
summarize_all(mean, na.rm = TRUE)
# cy bt cl pf ne YH YI
# 1 H 1.785714 0.7209302 53.41463 51.75952 21.92857 29.40476
# 2 K 1.333333 0.8333333 33.33333 47.83333 20.66667 27.33333
# 3 M 1.777778 0.4444444 63.75000 58.68889 24.88889 44.22222
# 4 O 2.062500 0.8750000 31.66667 53.05333 18.06667 30.78571
I think this will work:
meancnty<-(aggregate(with(newChart(cbind(pf,ne,YH,YI),
by = list(newchart$cy), FUN=mean, na.rm=T))
I used the following test data:
> q<- data.frame(y = sample(c(0,1), 10, replace=T), a = runif(10, 1, 100), b=runif(10, 20,30))
> q$a[c(2, 5, 7)]<- NA
> q$b[c(1, 3, 4)]<- NA
> q
y a b
1 0 86.87961 NA
2 0 NA 22.39432
3 0 89.38810 NA
4 0 12.96266 NA
5 1 NA 22.07757
6 0 73.96121 24.13154
7 0 NA 22.31431
8 1 62.77095 21.46395
9 0 55.28476 23.14393
10 0 14.01912 28.08305
Using your code from above, I get:
> aggregate(cbind(a,b)~y, data=q, mean, na.rm=T)
y a b
1 0 47.75503 25.11951
2 1 62.77095 21.46395
which is wrong, i.e. it deletes all rows with any NAs and then takes the mean.
This however gave the right result:
> aggregate(with(q, cbind(a, b)), by = list(q$y), mean, na.rm=T)
Group.1 a b
1 0 55.41591 24.01343
2 1 62.77095 21.77076
It did na.rm=T by column first, and then took the average by group.
Unfortunately, I have no idea why that is, but my guess is that is has to do with the class of y.

Calculating table in R with uneven length

I have to table of data in R
a = Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
1 2 0 0 0 2
2 3 0 0 10 3
3 4 0 51 25 0
4 5 19 129 14 0
5 6 60 137 1 0
6 7 31 62 15 5
7 8 7 11 7 0
and
b = Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
1 1 0 0 1 266
2 2 1 0 47 335
3 3 1 26 415 142
4 4 3 965 508 5
5 5 145 2535 103 0
6 6 939 2239 15 6
7 7 420 613 86 34
8 8 46 84 36 16
I wouold like to calculate b/a by matching the duration. I though of some thing like ifelse() but it does not work. Can someone please help me?
Thanks a lot
Match the order and selection of b with a (in my example y with x). Then do the math.
x <- data.frame(duration = 2:8, v = rnorm(7))
y <- data.frame(duration = 8:1, v = rnorm(8))
m <- match(y$duration, x$duration)
ym <- y[m[!is.na(m)],]
x$v/ym$v
It does not work when x contains items that are not in y, btw.
Do you want something like the following:
a <- a[-1]
b <- b[-1]
a <- a[order(a$Duration),]
b <- b[order(b$Duration),]
durations <- intersect(a$Duration, b$Duration)
b[b$Duration %in% durations,] / a[a$Duration %in% durations,]
Duration (-10,0] (0,0.25] (0.25,0.5] (0.5,10]
2 1 Inf NaN Inf 167.50000
3 1 Inf Inf 41.500000 47.33333
4 1 Inf 18.921569 20.320000 Inf
5 1 7.631579 19.651163 7.357143 NaN
6 1 15.650000 16.343066 15.000000 Inf
7 1 13.548387 9.887097 5.733333 6.80000
8 1 6.571429 7.636364 5.142857 Inf
you may like to replace NaN and Inf values by something else.

Resources