Making a function that builds a dataframe - r

I'm trying to make a function that basically builds a dataframe and returns it. This new dataframe is made of columns taken from another dataframe that I have, called metadata.. in addetion to some additional data that I want to control, by passing the TRUE or FALSE values when calling the function.
Here is what I did:
make_data = function(metric, use_additions = FALSE){
data = data.frame(my_metric = metadata[['metric']], gender = metadata$Gender ,
age = as.numeric(metadata$Age) , use_additions = t(additional_data))
data = data %>% dplyr::select(my_metric, everything())
return(data)
}
data = make_data(CR, FALSE)
I want to pass different metric values each time, and all other features stay the same. So here for example I called the function with metric as CR which is the name of the column I want in the metadata. The argument I want to control is use_additions, sometines I want to add it and sometimes I don't.
metadata and additional_data have the exact same row names and the same rows number. It's just adding the data or not.
I get this error(s):
Error in data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
In addition: Warning message:
In data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
Error in data.frame(my_metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
I've tried several ways to do this, with '' and without, using the $, but non of these worked. So for example when I type metric = metadata[[metric]] I get this:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'CR' not found

make_data = function(colname, use_additions = FALSE){
data = data.frame(my_metric = metadata[colname], gender = metadata$Gender ,
age = as.numeric(metadata$Age))
if (use_additions) data$use_additions=additional_data
return(data)
}
data = make_data(“CR”, FALSE)

Related

Problem with for loop when downloading species occurrence data

I want to download the occurrence data from gbif website and I use the following R script. When I run the script, I got an error with the following message "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0)". It would be highly appreciated if anyone could help me with this.
My data: data
My R script:
flist<-read_excel("Mekong fish.xlsx",sheet="Sheet1")
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list)) {
# create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
datfish[[i]]<-as.data.frame(fname[[i]]$data)
}
I got a different error:
Expecting logical in D1424 / R1424C4: got 'in Lao'Expecting logical in D1426 / R1426C4: got 'in China'Expecting logical in D1467 / R1467C4: got 'only Cambodia'Expecting logical in D1469 / R1469C4: got 'only in VN'Expecting logical in D1473 / R1473C4: got 'only in China'Expecting logical in D1486 / R1486C4: got 'only in Malaysia'Expecting logical in D1488 / R1488C4: got 'only 1 point in VN'
I think the problem is caused in some fields in the 4th column. I don't have the right packages installed to run your code. But I got a different error (package missing) once i dropped the fourth column.
flist<-read_excel("~/Downloads/Mekong fish.xlsx",sheet="Sheet1")
flist <=subset(flist, select = -4)
...
EDIT:
This worked for me. read_excel assigned column 4 the type boolean. When I explicitly set it to text it worked.
library(readxl)
library(rgbif)
library(raster)
flist<-read_excel("~/Downloads/Mekong fish.xlsx",
sheet="Sheet1",
col_types = c("numeric", "text", "numeric", "text"))
flist
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list[1:2])) {
message(i)
# # create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
message(Occ[[i]])
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
message(fname[[i]])
datfish[[i]]<-as.data.frame(fname[[i]]$data)
message(datfish[[i]])
}
> 1
> list(offset = 0, limit = 2, endOfRecords = FALSE, count = >15)list(list(name = c("Animalia", "Chordata", "Actinopterygii",
> "Cypriniformes", "Cyprinidae", "Aaptosyax", "Aaptosyax grypus"), key = > > c("1", "44", "204", "1153", "7336", "2363805", "2363806"),
> etc...

Clustering by M3C package : Error in `[.data.frame`(df, neworder2) : undefined columns selected

I had a similar problem to what posted here. To resolve the issue, followed the answer by #Jack Gisby there. Now a new error showed up:
Working on TCGA data , I am getting the same error (first error):
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
running duplicated() on each relevant field returned FALSE.
Her is the second error (just after trimming identifiers to not start with a common string like "TCGA-"):
Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem,
pFeature = 1, clusterAlg = clusteralg, distance = distance,
title = "/home/christopher/Desktop/", des = des, lthick = lthick,
dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots,
silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC",
fsize = 8, lthick = 1, dotsize = 1.25)
I've added to an opened issue on the M3C GitHub.
I got the same error as Hamid Ghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):
df <- data.frame(m_matrix)
Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.
To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.
Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

Gini Index in R

I am trying to calculate the Gini index for each row of my database. Each row is a customer and each column is a monthly session. So what i need to do is to add a column with the Gini index by row, for each customer throughout the 12 months.
See example attached
I found some examples online and did this:
Gini_index <- apply(DT_file[,c('sessions_201607_pct','sessions_201608_pct', 'sessions_201609_pct','sessions_201610_pct','sessions_201611_pct','sessions_201612_pct','sessions_201701_pct','sessions_201702_pct','sessions_201703_pct','sessions_201704_pct','sessions_201705_pct','sessions_201706_pct')], 1, gini)
However, I get the following error:
Error in match.fun(FUN) : object 'gini' not found
I have installed both Ineq and Reldist (and libraries) so I don't know why this isn't working.
Try to do this to have your gini's coeff by column :
library(ineq)
coeff= NULL
for (i in colnames(your_data[,-1])){
coeff= c(coeff,round(ineq(your_data[,i],type = 'Gini'),4))
}
data_coeff = data.frame(cbind(coeff,colnames(your_data[,-1])))
colnames(data_coeff) = c("Coeff","Colnames")
If you want it by for each rows try this :
your_new_data = as.data.frame(t(your_data[,-1]), row.names =T)
colnames(your_new_data) = your_data[,1]
ind = NULL
for (i in colnames(your_new_data)){
ind = c(ind,round(ineq(your_new_data[,i],type = 'Gini'),4))
}
data_coeff= data.frame(cbind(ind,colnames(your_new_data)))
colnames(data_coeff) = c("Coeff","customer")
Finaly you add your coeffs at the end of your data_frame with a merge for instance :
your_data_final = merge(your_data,data_coeff, by = "customer" )

Error in as(x, class(k)) : no method or default for coercing “NULL” to “data.frame”

I am currently facing an error mentioned below which is related to NULL values being coerced to a data frame. The data set does contain nulls, however I have tried both is.na() and is.null() functions to replace the null values with something else. The data is stored on hdfs and is stored in a pig.hive format. I have also attached the code below. The code works fine if I remove v[,25] from the key.
Code:
AM = c("AN");
UK = c("PP");
sample.map <- function(k,v){
key <- data.frame(acc = v[!which(is.na(v[,1],1],
year = substr(v[!which(is.na(v[,1]),2],1,4),
month = substr(v[!which(is.na(v[,1]),2],5,6))
value <- data.frame(v[,3],count=1)
keyval(key,value)
}
sample.reduce <- function(key,v){
AT <- sum(v[which(v[,1] %in% AM=="TRUE"),2])
UnknownT <- sum(v[which(v[,1] %in% UK=="TRUE"),2])
Total <- AT + UnknownT
d <- data.frame(AT,UnknownT,Total)
keyval(key,d)
}
out <- mapreduce(input ="/user/hduser/input",
output = "/user/hduser/output",
input.format = make.input.format("pig.hive", sep = "\u0001")
output.format = make.output.format("csv", sep = ","),
map= sample.map)
reduce = sample.reduce)
Error:
Warning in asMethod(object) : NAs introduced by coercion
Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) : data length is not a multiple of split variable
Warning in rmr.split(x, x, FALSE, keep.rownames = FALSE) : number of items to replace is not a multiple of replacement length Warning in split.default(1:rmr.length(y), unique(ind), drop = TRUE) :
data length is not a multiple of split variable
Warning in rmr.split(v, ind, lossy = lossy, keep.rownames = TRUE) : number of items to replace is not a multiple of replacement length
Error in as(x, class(k)) :
no method or default for coercing “NULL” to “data.frame”
Calls: <Anonymous> ... apply.reduce -> c.keyval -> reduce.keyval -> lapply -> FUN -> as No traceback available
UPDATE
I have added the sample data and edited the code above. Hope this helps!
Sample Data:
NULL,"2014-03-14","PP"
345689202,"2014-03-14","AN"
234539390,"2014-03-14","PP"
123125444,"2014-03-14","AN"
NULL,"2014-03-14","AN"
901828393,"2014-03-14","AN"
There are some issues with as which have been identified recently. I don't see why as can't handle this by default, but you can modify coerce which handles the conversion with an S4 method to call as.data.frame.
setMethod("coerce",c("NULL","data.frame"), function(from, to, strict=TRUE) as.data.frame(from))
[1] "coerce"
as(NULL,"data.frame")
data frame with 0 columns and 0 rows

R shiny: how to create a dynamic list with names and values

I want to create a dynamic list with the names and values based on user inputs. I need to pass a list with the names of each factor as well as two values for each factor to a function.For example,
factor.names=list( A=c(-1,1),B=c(-1,1),C=c(-1,1),D=c(-1,1) ) )
The code below changes the factor values but leaves the names as nf1,nf2 etc.
if(input$fac==2){
names<-list(nf1 = c(input$l1,input$h1),nf2 = c(input$l2,input$h2))
}
I have tried using
names<-list(input$nf1 = c(input$l1,input$h1), input$nf2 = c(input$l2,input$h2))
But I keep on getting the following error:
Error in source(file, ..., keep.source = TRUE, encoding = checkEncoding(file)) :
C:\Users\Fred\Documents\App/server.R:49:59: unexpected '='
})
names<-list(n1 = c(input$l1,input$h1),input$nf2 =
^
I have also tried
n1<-reactive({
as.character(input$nf1)
})
names<-list(n1 = c(input$l1,input$h1),n2 = c(input$l2,input$h2))
}
But the names just stay as n1, n2 etc.
Any help or advice on the topic would be highly appreciated.

Resources