I am using the below code for calculating the mode of a dataframe:
library(modeest)
apply(df[ ,2:length(df)], 1, mfv)
My data looks like this:
Item A B C
Book001 56 32 56
Book002 95 95 20
Book003 50 89 50
Book004 6 65 40
It gives me the following output:
[[1]]
[1] 56
[[2]]
[1] 95
[[3]]
[1] 50
[[4]]
[1] 6 40 65
This code is perfect only if the data contains a recurring term.
How can I display the mode as NA when there is no recurring term?
Let's try with a custom function:
foo <- function(x){
out <- mfv(x)
if(length(out) > 1) out <- NA
return(out)
}
apply(df[ ,2:length(df)], 1, foo)
# [1] 56 95 50 NA
Related
Suppose, I have a dataframe, df, and I want to create a new column called "c" based on the addition of two existing columns, "a" and "b". I would simply run the following code:
df$c <- df$a + df$b
But I also want to do this for many other columns. So why won't my code below work?
# Reproducible data:
martial_arts <- data.frame(gym_branch=c("downtown_a", "downtown_b", "uptown", "island"),
day_boxing=c(5,30,25,10),day_muaythai=c(34,18,20,30),
day_bjj=c(0,0,0,0),day_judo=c(10,0,5,0),
evening_boxing=c(50,45,32,40), evening_muaythai=c(50,50,45,50),
evening_bjj=c(60,60,55,40), evening_judo=c(25,15,30,0))
# Creating a list of the new column names of the columns that need to be added to the martial_arts dataframe:
pattern<-c("_boxing","_muaythai","_bjj","_judo")
d<- expand.grid(paste0("martial_arts$total",pattern))
# Creating lists of the columns that will be added to each other:
e<- names(martial_arts %>% select(day_boxing:day_judo))
f<- names(martial_arts %>% select(evening_boxing:evening_judo))
# Writing a function and using mapply:
kick_him <- function(d,e,f){d <- rowSums(martial_arts[ , c(e, f)], na.rm=T)}
mapply(kick_him,d,e,f)
Now, mapply produces the correct results in terms of the addition:
> mapply(ff,d,e,f)
Var1 <NA> <NA> <NA>
[1,] 55 84 60 35
[2,] 75 68 60 15
[3,] 57 65 55 35
[4,] 50 80 40 0
But it doesn't add the new columns to the martial_arts dataframe. The function in theory should do the following
martial_arts$total_boxing <- martial_arts$day_boxing + martial_arts$evening_boxing
...
...
martial_arts$total_judo <- martial_arts$day_judo + martial_arts$evening_judo
and add four new total columns to martial_arts.
So what am I doing wrong?
The assignment is wrong here i.e. instead of having martial_arts$total_boxing as a string, it should be "total_boxing" alone and this should be on the lhs of the Map/mapply. As the OP already created the 'martial_arts$' in 'd' dataset as a column, we are removing the prefix part and do the assignment
kick_him <- function(e,f){rowSums(martial_arts[ , c(e, f)], na.rm=TRUE)}
martial_arts[sub(".*\\$", "", d$Var1)] <- Map(kick_him, e, f)
-check the dataset now
> martial_arts
gym_branch day_boxing day_muaythai day_bjj day_judo evening_boxing evening_muaythai evening_bjj evening_judo total_boxing total_muaythai total_bjj total_judo
1 downtown_a 5 34 0 10 50 50 60 25 55 84 60 35
2 downtown_b 30 18 0 0 45 50 60 15 75 68 60 15
3 uptown 25 20 0 5 32 45 55 30 57 65 55 35
4 island 10 30 0 0 40 50 40 0 50 80 40 0
For this data how to fix this problem
> x=data.frame(c(v1="a" ,"b" ,"c" ,"d" ,"e"),
+ v2=c(97 ,90 ,93 ,97 ,90),
+ v3=c( 85 ,91 ,87 ,91 ,93))
> library(e1071)
> f <- cmeans(x, 2)
Error in cmeans(x, 2) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In cmeans(x, 2) : NAs introduced by coercion
2: In cmeans(x, 2) : NAs introduced by coercion
> f
I want to apply c-maen to my data as is illustrated code in above, it contains three vectors: v1,v2 ,v2 I want to apply c-mean label by vector (v1)
If we look at the documentation of ?cmeans,
x - The data matrix where columns correspond to variables and rows to observations.
So, we can convert the data.frame to matrix after removing the character column (1st column)
x1 <- as.matrix(x[-1])
row.names(x1) <- x[,1]
cmeans(x1, 2)
#Fuzzy c-means clustering with 2 clusters
#Cluster centers:
# v2 v3
#1 90.30090 91.85191
#2 95.75436 87.22535
#Memberships:
# 1 2
#a 0.06614213 0.93385787
#b 0.98305641 0.01694359
#c 0.19855988 0.80144012
#d 0.25730888 0.74269112
#e 0.97924422 0.02075578
#Closest hard clustering:
#a b c d e
#2 1 2 2 1
#Available components:
#[1] "centers" "size" "cluster" "membership" "iter" "withinerror" "call"
The k-mean family of partitional clustering algorithm works on the principle of mean which by its nature will accept only numeric values. You are getting an error because, the dataframe consist of both numeric and categorical values, which c-mean() does not like. Also, there is no need to convert the dataframe to matrix because that is not the actual problem.
Therefore,
Alternative approach
Discretize the character variable to assign it numbers and then apply clustering. This way there is no need to drop any variable.
# create empty data frame
df<- setNames(data.frame(matrix(ncol = 5, nrow = 5)), c("a" ,"b" ,"c" ,"d" ,"e"))
# fill values
df$a<- c("aaaa" ,"bbbb" ,"cccc" ,"dddd" ,"eeee")
df$b<- c(97 ,90 ,93 ,97 ,90)
df$c<- c(97 ,90 ,93 ,97 ,90)
df$d<- c( 85 ,91 ,87 ,91 ,93)
df$e<- c( 85 ,91 ,87 ,91 ,93)
# show the dataframe
df
a b c d e
1 aaaa 97 97 85 85
2 bbbb 90 90 91 91
3 cccc 93 93 87 87
4 dddd 97 97 91 91
5 eeee 90 90 93 93
# Discretize the character variable
df$a <- as.numeric( factor(df$a) ) -1
df
a b c d e
1 0 97 97 85 85
2 1 90 90 91 91
3 2 93 93 87 87
4 3 97 97 91 91
5 4 90 90 93 93
# Apply clustering
library(e1071)
cmeans(df, 2)
Fuzzy c-means clustering with 2 clusters
Cluster centers:
a b c d e
1 1.406 95.72 95.72 87.18 87.18
2 2.510 90.36 90.36 91.85 91.85
Memberships:
1 2
[1,] 0.92728 0.07272
[2,] 0.04014 0.95986
[3,] 0.80061 0.19939
[4,] 0.72009 0.27991
[5,] 0.03544 0.96456
Closest hard clustering:
[1] 1 2 1 1 2
Available components:
[1] "centers" "size" "cluster" "membership" "iter"
[6] "withinerror" "call"
I have a function which returns a list of two objects (a list l and a number n). I want to loop over this function in a foreach loop.
create_lists <- function(){
l = sample(100, 5)
n = sample(100, 1)
return(list(l=l, n=n))}
Because create_lists has a list as ouput, this post told me to use a combine function which looks like this:
combine_custom <- function(list1, list2){
ls = c(list1$l, list2$l)
ns = c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
So now my foreach loop looks like this:
m = foreach(i=1:5, .combine = combine_custom)%do%{
create_lists()}
My desired output would be:
m$l
[[1]]
[1] 100 25 86 21 28
[[2]]
[1] 78 37 79 41 61
[[3]]
[1] 73 22 78 94 13
[[4]]
[1] 15 28 76 78 52
[[5]]
[1] 32 93 92 2 1
m$n
[1] 52 56 3 79 82
But what I get is something like this:
$l
[1] 84 28 75 59 68 84 28 75 59 68
$n
[1] 31 91 18 98 39
So I have two problems:
1) Why is everything but two of the l lists dropped?
2) How can I make m$l to be a list of lists?
EDIT:
I tried another approach I got from here which does not use c:
combine_custom <- function(list1, list2){
ls = list1$l[[length(list1$l)+1]] = list(list2$l)
ns = c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
But this gave the same result as described above, to be exact:
$l
$l[[1]]
[1] 65 84 48 81 82
$n
[1] 88 79 92 36 71
I have found another way which avoids the problem mentioned above, namely that combine has to create a new list first and later only append lists.
Also, the real function I am using actually returns a list of lists, so the following proved useful:
combine_custom <- function(list1, list2) {
if (plotrix::listDepth(list1$l) > plotrix::listDepth(list2$l)) {
ls <- c(list1$l, list(list2$l))
} else {
ls <- c(list(list1$l), list(list2$l))
}
ns <- c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
This is not perfect if the function can return lists of varying nesting depths, but it works in my case.
The combine part is giving a lot of trouble, because on the first iteration, it needs to make a list out of two lists , but on the second iteration, it needs to append one list as an element to a list of lists.
Another approach (may or may not work depending on the size of your actual data/problem) is to use the purrr package for working with lists:
> m <- foreach(i=1:3)%do%{create_lists()}
> m
[[1]]
[[1]]$l
[1] 21 33 12 50 36
[[1]]$n
[1] 74
[[2]]
[[2]]$l
[1] 12 80 39 78 6
[[2]]$n
[1] 74
[[3]]
[[3]]$l
[1] 9 61 75 63 94
[[3]]$n
[1] 2
> purrr::transpose(m)
$l
$l[[1]]
[1] 21 33 12 50 36
$l[[2]]
[1] 12 80 39 78 6
$l[[3]]
[1] 9 61 75 63 94
$n
$n[[1]]
[1] 74
$n[[2]]
[1] 74
$n[[3]]
[1] 2
Hope that helps!
Thank you #Maria H., you solved my problem! The 'plotrix' package didn't work for me, but I used 'collapse' and it worked fine:
combine_custom1 <- function(a, b) {
if (collapse::ldepth(a) > collapse::ldepth(b)) {
ls <- c(a, list(b))
} else {
ls <- c(list(a), list(b))
}
return(ls)
}
I have a matrix that consists of two columns and a number (n) of rows, while each row represents a point with the coordinates x and y (the two columns).
This is what it looks (LINK):
V1 V2
146 17
151 19
153 24
156 30
158 36
163 39
168 42
173 44
...
now, I would like to use a subset of three consecutive points starting from 1 to do some fitting, save the values from this fit in another list, an den go on to the next 3 points, and the next three, ... till the list is finished. Something like this:
Data_Fit_Kasa_1 <- CircleFitByKasa(Data[1:3,])
Data_Fit_Kasa_2 <- CircleFitByKasa(Data[3:6,])
....
Data_Fit_Kasa_n <- CircleFitByKasa(Data[i:i+2,])
I have tried to construct a loop, but I can't make it work. R either tells me that there's an "unexpected '}' in "}" " or that the "subscript is out of bonds". This is what I've tried:
minimal runnable code
install.packages("conicfit")
library(conicfit)
CFKasa <- NULL
Data.Fit <- NULL
for (i in 1:length(Data)) {
row <- Data[i:(i+2),]
CFKasa <- CircleFitByKasa(row)
Data.Fit[i] <- CFKasa[3]
}
RStudio Version 0.99.902 – © 2009-2016 RStudio, Inc.; Win10 Edu.
The third element of the fitted circle (CFKasa[3]) represents the radius, which is what I am really interested in. I am really stuck here, please help.
Many thanks in advance!
Best, David
Turn your data into a 3D array and use apply:
DF <- read.table(text = "V1 V2
146 17
151 19
153 24
156 30
158 36
163 39", header = TRUE)
a <- t(DF)
dim(a) <-c(nrow(a), 3, ncol(a) / 3)
a <- aperm(a, c(2, 1, 3))
# , , 1
#
# [,1] [,2]
# [1,] 146 17
# [2,] 151 19
# [3,] 153 24
#
# , , 2
#
# [,1] [,2]
# [1,] 156 30
# [2,] 158 36
# [3,] 163 39
center <- function(m) c(mean(m[,1]), mean(m[,2]))
t(apply(a, 3, center))
# [,1] [,2]
#[1,] 150 20
#[2,] 159 35
center(DF[1:3,])
#[1] 150 20
Here is the data set, say name is DS.
Abc Def Ghi
1 41 190 67
2 36 118 72
3 12 149 74
4 18 313 62
5 NA NA 56
6 28 NA 66
7 23 299 65
8 19 99 59
9 8 19 61
10 NA 194 69
How to get a new dataset DSS where value of column Abc is greater than 25, and value of column Def is greater than 100.It should also ignore any row if value of atleast one column in NA.
I have tried few options but wasn't successful. Your help is appreciated.
There are multiple ways of doing it. I have given 5 methods, and the first 4 methods are faster than the subset function.
R Code:
# Method 1:
DS_Filtered <- na.omit(DS[(DS$Abc > 20 & DS$Def > 100), ])
# Method 2: which function also ignores NA
DS_Filtered <- DS[ which( DS$Abc > 20 & DS$Def > 100) , ]
# Method 3:
DS_Filtered <- na.omit(DS[(DS$Abc > 20) & (DS$Def >100), ])
# Method 4: using dplyr package
DS_Filtered <- filter(DS, DS$Abc > 20, DS$Def >100)
DS_Filtered <- DS %>% filter(DS$Abc > 20 & DS$Def >100)
# Method 5: Subset function by default ignores NA
DS_Filtered <- subset(DS, DS$Abc >20 & DS$Def > 100)