R: converting fractions into decimals in a data frame - r

I am trying to convert a data frame of numbers stored as characters in a fraction form to be stored as numbers in decimal form. (There are also some integers, also stored as char.) I want to keep the current structure of the data frame, i.e. I do not want a list as a result.
Example data frame (note: the real data frame has all elements as character, here it is a factor but I couldn't figure out how to replicate a data frame with characters):
a <- c("1","1/2","2")
b <- c("5/2","3","7/2")
c <- c("4","9/2","5")
df <- data.frame(a,b,c)
I tried df[] <- apply(df,1, function(x) eval(parse(text=x))). This calculates the numbers correctly, but only for the last column, populating the data frame with that.
Result:
a b c
1 4 4.5 5
2 4 4.5 5
3 4 4.5 5
I also tried df[] <- lapply(df, function(x) eval(parse(text=x))), which had the following result (and I have no idea why):
a b c
1 3 3 2
2 3 3 2
3 3 3 2
Desired result:
a b c
1 1 2.5 4
2 0.5 3 4.5
3 2 3.5 5
Thanks a lot!

You are probably looking for:
df[] <- apply(df, c(1, 2), function(x) eval(parse(text = x)))
df
a b c
1 1.0 2.5 4.0
2 0.5 3.0 4.5
3 2.0 3.5 5.0
eval(parse(text = x))
evaluates one expression at a time so, you need to run cell by cell.
EDIT: if some data frame elements can not be evaluated you can account for that by adding an ifelse statement inside the function:
df[] <- apply(df, c(1, 2), function(x) if(x %in% skip){NA} else {eval(parse(text = x))})
Where skip is a vector of element that should not be evaluated.

Firstly, you should prevent your characters from turning into factors in data.frame()
df <- data.frame(a, b, c, stringsAsFactors = F)
Then you can wrap a simple sapply/lapply inside your lapply to achieve what you want.
sapply(X = df, FUN = function(v) {
sapply(X = v,
FUN = function(w) eval(parse(text=w)))
}
)
Side Notes
If you feed eval an improper expression such as expression(1, 1/2, 2), that evaluates to last value. This explains the 4 4.5 5 output. A proper expression(c(1, 1/2, 2)) evaluates to the expected answer.
The code lapply(df, function(x) eval(parse(text=x))) returns a 3 3 2 because sapply(data.frame(a,b,c), as.numeric) returns:
a b c
[1,] 1 2 1
[2,] 2 1 3
[3,] 3 3 2
These numbers correspond to the levels() of the factors, through which you were storing your fractions.

To those looking for a one-liner: you can use parse_ratio from the DOSE package to coerce the character fractions to numeric.
library(DOSE)
b <- c("5/2","3","7/2")
parse_ratio(b)
[1] 2.5 1.0 3.5

Related

Duplicating R dataframe vector values using another vector as a guide

I have the following R dataframe: df = data.frame(value=c(5,4,3,2,1), a=c(2,0,1,6,9), b=c(7,0,0,3,4)). I would like to duplicate the values of a and b by the number of times of the corresponding position values in value. For example, Expanding b would look like b_ex = c(7,7,7,7,7,2,2,2,4). No values of three or four would be in b_ex because values of zero are in b[2] and b[3]. The expanded vectors would be assigned names and be stand-alone.
Thanks!
Maybe you are looking for :
result <- lapply(df[-1], function(x) rep(x[x != 0], df$value[x != 0]))
#$a
#[1] 2 2 2 2 2 1 1 1 6 6 9
#$b
#[1] 7 7 7 7 7 3 3 4
To have them as separate vectors in global environment use list2env :
list2env(result, .GlobalEnv)

R apply function to nested list elements using "[["

Given a nested list of numeric vectors like
l = list( a = list(1:2, 3:5), b = list(6:10, 11:16))
If I want to apply a function, say length, of the "index 1 / first" numeric vectors I can do it using the subset function [[:
> sapply(lapply(l, "[[", 1), length)
a b
2 5
I cant figure how to supply arbitrary indeces to [[ in order to get length of (in this example) both vectors in every sub-list (a naive try : sapply(lapply(l, "[[", 1:2), length)).
The [[ can only subset a single one. Instead, we need [ for more than 1 and then use lengths
sapply(lapply(l, "[", 1:2), lengths)
# a b
#[1,] 2 5
#[2,] 3 6
Not using base, but purrr is a great package for lists.
library(purrr)
map_dfc(l, ~lengths(.[1:2]))
# A tibble: 2 x 2
a b
<int> <int>
1 2 5
2 3 6
Maybe the code below can help...
> sapply(l, function(x) sapply(x, length))
a b
[1,] 2 5
[2,] 3 6

R: Unpack a function that returns multiple objects to multiple columns of a dataframe

Hopefully I can just explain the issue:
I have a function of the following form which returns two values of interest.
return_network <- function(team_id){
... [ do something to produce adjacency matrix and network density measures]
g <- graph.adjacency(co_occur, weighted=TRUE, mode ='undirected')
g <- simplify(g)
return(c(weighted_network_density, g))
I then want to iterate over a column in a dataframe, apply the above function, and unpack it to two columns. I have tried the following:
team_measures[, c('weighted_network_density', 'graph_object')] <- apply(team_measures[, "team_id", drop=F], 1, return_network)
However, I get a warning message:
Warning message:
In `[<-.data.frame`(`*tmp*`, , c("weighted_network_density", "graph_object"), :
provided 429 variables to replace 2 variables
And the resulting dataframe is full of nonsense.
Here's a guess at the problem: The output of each step in apply is bound as columns, even when you apply over rows, so the result is transposed from the way (at least I) would expect. My simple example below doesn't reproduce your error since it So if we have this data.frame:
df <- data.frame(dog = c(1,2,3), cat = c(4,5,6), fish = c(7,8,9))
df
dog cat fish
1 1 4 7
2 2 5 8
3 3 6 9
If we apply a function by rows that returns 2 values, we get a matrix with 2 rows:
apply(df, 1, function(x) c(x['dog'], x['cat']))
[,1] [,2] [,3]
dog 1 2 3
cat 4 5 6
If we leave it as a matrix, we can pass it into 2 columns of a data frame without an error message, but it will coerce it in a strange way that gives a nonsensical result:
df2 <- df
df2[,c('cat', 'fish')] <- apply(df, 1, function(x) c(x['dog'], x['cat']))
df2
dog cat fish
1 1 1 5
2 2 4 3
3 3 2 6
If we convert the result to a data.frame before assigning it (which might be happening somewhere in your code) we get a similar error:
df2[,c('cat', 'fish')] <- as.data.frame(apply(df, 1, function(x) c(x['dog'], x['cat'])))
Error in `[<-.data.frame`(`*tmp*`, , c("cat", "fish"), value = list(V1 = c(1, :
replacement element 1 has 2 rows, need 3
Transposing the result before passing it in silences the error and results in the data being put in the data the right way:
df2[,c('cat', 'fish')] <- as.data.frame(t(apply(df, 1, function(x) c(x['dog'], x['cat']))))
df2
dog cat fish
1 1 1 4
2 2 2 5
3 3 3 6

Extract only first line in a data frame from several subgroups that satisfy a conditional

I have a data frame similar to the dummy example here:
df<-data.frame(Group=rep(letters[1:3],each=3),Value=c('NA','NA','10','NA','4','8','NA','NA','2'))
In the original data frame, there are many more groups, each with 10 values. For each group (a,b or c) I would like to extract the first line where value!=NA, but only the first line where this is true. As in a group there could be several values different from NA and from each other I can't simply subset.
I was imagining something like this using plyr and a conditional, but I honestly have no idea what the conditional should take:
ddply<-(df,.(Group),function(sub_data){
for(i in 1:length(sub_data$value)){
if(sub_data$Value!='NA'){'take value but only for the first non NA')
return(first line that satisfies)
})
Maybe this is easy with other strategies that I don't know of
Any suggestion is very much appreciated!
I know this has been answered but for this you should be looking at the data.table package. It provides a very expressive and terse syntax for doing what you ask:
df<-data.table(Group=rep(letters[1:3],each=3),Value=c('NA','NA','10','NA','4','8','NA','NA','2'))
> df[ Value != "NA", .SD[1], by=Group ]
Group Value
1: a 10
2: b 4
3: c 2
Do youself a favor and learn data.table
Some other notes:
You can easily convert data.frames to data.tables
I think that you don't want "NA" but simply NA in your example, in that case the syntax is:
df[ ! is.na(Value), .SD[1], by=Group ]
Since you suggested plyr in the first place:
ddply(subset(df, !is.na(Value)), .(Group), head, 1L)
That assumes you have NAs and not 'NA's. If the latter (not recommended), then:
ddply(subset(df, Value != 'NA'), .(Group), head, 1L)
Note how concise this is. I would agree with using plyr.
If you're willing to use actual NA's vs strings, then the following should give you what you're looking for:
df <- (Group=rep(letters[1:3], each=3),
Value=c(NA,NA,'10',NA,'4','8',NA,NA,'2'))
print(df)
## Group Value
## 1 a <NA>
## 2 a <NA>
## 3 a 10
## 4 b <NA>
## 5 b 4
## 6 b 8
## 7 c <NA>
## 8 c <NA>
## 9 c 2
df.1 <- by(df, df$Group, function(x) {
head(x[complete.cases(x),], 1)
})
print(df.1)
## df$Group: a
## Group Value
## 3 a 10
## ------------------------------------------------------------------------
## df$Group: b
## Group Value
## 5 b 4
## ------------------------------------------------------------------------
## df$Group: c
## Group Value
## 9 c 2
First you should take care of NA's:
options(stringsAsFactors=FALSE)
df<-data.frame(Group=rep(letters[1:3],each=3),Value=c(NA,NA,'10',NA,'4','8',NA,NA,'2'))
And then maybe something like this would do the trick:
for(i in unique(df$Group)) {
for(j in df$Value[df$Group==i]) {
if(!is.na(j)) {
print(paste(i,j))
break
}
}
}
Assuming that Value is actually numeric, not character.
> df <- data.frame(Group=rep(letters[1:3],each=3),
Value=c(NA, NA, 10, NA, 4, 8, NA, NA, 2)
> do.call(rbind, lapply(split(df, df$Group), function(x){
x[ is.na(x[,2]) == FALSE, ][1,]
}))
## Group Value
## a a 10
## b b 4
## c c 2
I don't see any solutions using aggregate(...), which would be the simplest:
df<-data.frame(Group=rep(letters[1:3],each=3),Value=c('NA','NA','10','NA','4','8','NA','NA','2'))
aggregate(Value~Group,df[df$Value!="NA",],head,1)
# Group Value
# 1 a 10
# 2 b 4
# 3 c 2
If your df contains actual NA, and not "NA" as in your example, then use this:
df<-data.frame(Group=rep(letters[1:3],each=3),Value=c(NA,NA,'10',NA,'4','8',NA,NA,'2'))
aggregate(Value~Group,df[!is.na(df$Value),],head,1)
Group Value
1 a 10
2 b 4
3 c 2
Your life would be easier if you marked missing values with NA and not as a character string 'NA'; the former is really missing to R and it has tools to work with such missingness. The latter ('NA') is really not missing except for the meaning that this string has to you alone; R cannot divine that information directly. Assuming you correct this, then the solution below is one way to go about doing this.
Similar in spirit to #hrbrmstr's by() but to my eyes aggregate() gives nicer output:
> foo <- function(x) head(x[complete.cases(x)], 1)
> aggregate(Value ~ Group, data = df, foo)
Group Value
1 a 10
2 b 4
3 c 2
> aggregate(df$Value, list(Group = df$Group), foo)
Group x
1 a 10
2 b 4
3 c 2

Subsetting data frame by factor level

I have a big data frame with state names in one colum and different indexes in the other columns.
I want to subset by state and create an object suitable for minimization of the index or a data frame with the calculation already given.
Here's one simple (short) example of what I have
m
x y
1 A 1.0
2 A 2.0
3 A 1.5
4 B 3.0
5 B 3.5
6 C 7.0
I want to get this
m
x y
1 A 1.0
2 B 3.0
3 C 7.0
I don't know if a function with a for loop is necessary. Like
minimize<-function(x,...)
for (i in m$x){
do something with data by factor value
apply to that something the min function in every column
return(y)
}
so when you call
minimize(A)
[1] 1
I tried to use %in% but didn't work (I got this error).
A%in%m
Error in match(x, table, nomatch = 0L) : object 'A' not found
When I define it it goes like this.
A<-c("A")
"A"%in%m
[1] FALSE
Thank you in advance
Use aggregate
> aggregate(.~x, FUN=min, dat)
x y
1 A 1
2 B 3
3 C 7
See this post to get some other alternatives.
Try aggregate:
aggregate(y ~ x, m, min)
x y
1 A 1
2 B 3
3 C 7
Using data.table
require(data.table)
m <- data.table(m)
m[, j=min(y), by=x]
# x V1
# 1: A 1
# 2: B 3
# 3: C 7

Resources