Gini Index in R - r

I am trying to calculate the Gini index for each row of my database. Each row is a customer and each column is a monthly session. So what i need to do is to add a column with the Gini index by row, for each customer throughout the 12 months.
See example attached
I found some examples online and did this:
Gini_index <- apply(DT_file[,c('sessions_201607_pct','sessions_201608_pct', 'sessions_201609_pct','sessions_201610_pct','sessions_201611_pct','sessions_201612_pct','sessions_201701_pct','sessions_201702_pct','sessions_201703_pct','sessions_201704_pct','sessions_201705_pct','sessions_201706_pct')], 1, gini)
However, I get the following error:
Error in match.fun(FUN) : object 'gini' not found
I have installed both Ineq and Reldist (and libraries) so I don't know why this isn't working.

Try to do this to have your gini's coeff by column :
library(ineq)
coeff= NULL
for (i in colnames(your_data[,-1])){
coeff= c(coeff,round(ineq(your_data[,i],type = 'Gini'),4))
}
data_coeff = data.frame(cbind(coeff,colnames(your_data[,-1])))
colnames(data_coeff) = c("Coeff","Colnames")
If you want it by for each rows try this :
your_new_data = as.data.frame(t(your_data[,-1]), row.names =T)
colnames(your_new_data) = your_data[,1]
ind = NULL
for (i in colnames(your_new_data)){
ind = c(ind,round(ineq(your_new_data[,i],type = 'Gini'),4))
}
data_coeff= data.frame(cbind(ind,colnames(your_new_data)))
colnames(data_coeff) = c("Coeff","customer")
Finaly you add your coeffs at the end of your data_frame with a merge for instance :
your_data_final = merge(your_data,data_coeff, by = "customer" )

Related

Making a function that builds a dataframe

I'm trying to make a function that basically builds a dataframe and returns it. This new dataframe is made of columns taken from another dataframe that I have, called metadata.. in addetion to some additional data that I want to control, by passing the TRUE or FALSE values when calling the function.
Here is what I did:
make_data = function(metric, use_additions = FALSE){
data = data.frame(my_metric = metadata[['metric']], gender = metadata$Gender ,
age = as.numeric(metadata$Age) , use_additions = t(additional_data))
data = data %>% dplyr::select(my_metric, everything())
return(data)
}
data = make_data(CR, FALSE)
I want to pass different metric values each time, and all other features stay the same. So here for example I called the function with metric as CR which is the name of the column I want in the metadata. The argument I want to control is use_additions, sometines I want to add it and sometimes I don't.
metadata and additional_data have the exact same row names and the same rows number. It's just adding the data or not.
I get this error(s):
Error in data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
In addition: Warning message:
In data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
Error in data.frame(my_metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
I've tried several ways to do this, with '' and without, using the $, but non of these worked. So for example when I type metric = metadata[[metric]] I get this:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'CR' not found
make_data = function(colname, use_additions = FALSE){
data = data.frame(my_metric = metadata[colname], gender = metadata$Gender ,
age = as.numeric(metadata$Age))
if (use_additions) data$use_additions=additional_data
return(data)
}
data = make_data(“CR”, FALSE)

Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { : argument is of length zero

I am using WarbleR in R to do some acoustic analyses. As freq_range couldn't detect all the bottom frequencies very well, I have created a data frame manually with all the right bottom frequencies, loaded this into R and turned it into a selection table. Traq_freq_contour and compare.methods and freq_DTW all work fine (although freq_DTW does give a warning message:
Warning message: In (0:(n - 1)) * f : NAs produced by integer overflow
However. If I try to do the function cross_correlation, I get the following error:
Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { :
argument is of length zero
I do not get this error with a selection table with the bottom and top frequency added with the freq_range function in R instead of manually. What could be the issue here? The selection tables both look similar:
This is the selection table partly made by R through freq_range:
And this is the one with the bottom frequencies added manually (which has more sound files than the one before):
This is part of the code I use:
#Comparing methods for quantitative analysis of signal structure
compare.methods(X = stnew, flim = c(0.6,2.5), bp = c(0.6,2.5), methods = c("XCORR", "dfDTW"))
#Measure acoustic parameters with spectro_analysis
paramsnew <- spectro_analysis(stnew, bp = c(0.6,2), threshold = 20)
write.csv(paramsnew, "new_acoustic_parameters.csv", row.names = FALSE)
#Remove parameters derived from fundamental frequency
paramsnew <- paramsnew[, grep("fun|peakf", colnames(paramsnew), invert = TRUE)]
#Dynamic time warping
dm <- freq_DTW(stnew, length.out = 30, flim = c(0.6,2), bp = c(0.6,2), wl = 300, img = TRUE)
str(dm)
#Spectrographic cross-correlation
xcnew <- cross_correlation(stnew, wl = 300, na.rm = FALSE)
str(xc)
Any idea what I'm doing wrong?

Error while using tm1r to send dataset. How to debug this issue?

I am currently developing a database integrated forecasting tool for a costumer. I am using mainly R and TM1 Persepctives. To connect R with tm1 I use tm1r. For Data import from tm1 to r it works fine.
However, when I am trying to write back the calculated forecast from R to tm1, I run into problems.
I will give you some reprex data, so you can have a look at the output. If I use "tm2_send_data" instead of "tm1_send_dataset" it works fine too.
The latter function gives me the Error :
"Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < :
missing value where TRUE/FALSE needed"
I have no clue, what this is supposed to mean! I tried some formatting of the data types, without any effect.
library(tm1r)
# data
values <- data.frame(fake_values =
c(105,147,159,232,312,337,285,188,257,10,98,27)
)
date_stamps <- c("2021001","2021002","2021003","2021004","2021005","2021006","2021007","2021008","2021009","2021010","2021011","2021012")
rownames(values) = date_stamps
# Send dataset to TM1
con_obj <- tm1_connection("localhost", "8840", "test_admin", "")
tm1_send_dataset(
con_obj,
valueset = values, cube = "pvl_FORECAST_HILFSWÜRFEL",
rowdim = "PVL_test_Zeit", coldim = "pvl_Produkt",
titledim1 = "DATENART", titleel1 = "FC",
titledim2 = "Version", titleel2 = 'Version_Bearbeitung',
titledim3 = "FC-Scheibe", titleel3 = "ML_FC_2021",
titledim4 = "PVL_test_Kunde", titleel4 = "MGR_domestic_D",
titledim5 = "PVL_test_Measure", titleel5 = "Menge_EA"
)

Problem with for loop when downloading species occurrence data

I want to download the occurrence data from gbif website and I use the following R script. When I run the script, I got an error with the following message "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0)". It would be highly appreciated if anyone could help me with this.
My data: data
My R script:
flist<-read_excel("Mekong fish.xlsx",sheet="Sheet1")
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list)) {
# create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
datfish[[i]]<-as.data.frame(fname[[i]]$data)
}
I got a different error:
Expecting logical in D1424 / R1424C4: got 'in Lao'Expecting logical in D1426 / R1426C4: got 'in China'Expecting logical in D1467 / R1467C4: got 'only Cambodia'Expecting logical in D1469 / R1469C4: got 'only in VN'Expecting logical in D1473 / R1473C4: got 'only in China'Expecting logical in D1486 / R1486C4: got 'only in Malaysia'Expecting logical in D1488 / R1488C4: got 'only 1 point in VN'
I think the problem is caused in some fields in the 4th column. I don't have the right packages installed to run your code. But I got a different error (package missing) once i dropped the fourth column.
flist<-read_excel("~/Downloads/Mekong fish.xlsx",sheet="Sheet1")
flist <=subset(flist, select = -4)
...
EDIT:
This worked for me. read_excel assigned column 4 the type boolean. When I explicitly set it to text it worked.
library(readxl)
library(rgbif)
library(raster)
flist<-read_excel("~/Downloads/Mekong fish.xlsx",
sheet="Sheet1",
col_types = c("numeric", "text", "numeric", "text"))
flist
##Loop
fname<-list()
Occ<-list()
datfish<-list()
name_list<-unique(flist$Updated_name)
# create for loop to produce ggplot2 graphs
for (i in seq_along(name_list[1:2])) {
message(i)
# # create plot for each Occurrence in df
Occ[[i]] <-occ_search(scientificName = name_list[i], limit=2)
message(Occ[[i]])
fname[[i]]<-occ_search(scientificName = name_list[i],
fields = c("species", "country","decimalLatitude", "decimalLongitude"),
hasCoordinate=T, limit= Occ[[i]]$meta[4],return ="data")
message(fname[[i]])
datfish[[i]]<-as.data.frame(fname[[i]]$data)
message(datfish[[i]])
}
> 1
> list(offset = 0, limit = 2, endOfRecords = FALSE, count = >15)list(list(name = c("Animalia", "Chordata", "Actinopterygii",
> "Cypriniformes", "Cyprinidae", "Aaptosyax", "Aaptosyax grypus"), key = > > c("1", "44", "204", "1153", "7336", "2363805", "2363806"),
> etc...

How to access data saved in an assign construct?

I made a list, read the list into a for loop, do some calculations with it and export a modified dataframe to [1] "IAEA_C2_NoStdConditionResiduals1" [2] "IAEA_C2_EAstdResiduals2" ect. When I do View(IAEA_C2_NoStdConditionResiduals1) after the for loop then I get the following error message in the console: Error in print(IAEA_C2_NoStdConditionResiduals1) : object 'IAEA_C2_NoStdConditionResiduals1' not found, but I know it is there because RStudio tells me in its Environment view. So the question is: How can I access the saved data (in this assign construct) for further usage?
ResidualList = list(IAEA_C2_NoStdCondition = IAEA_C2_NoStdCondition,
IAEA_C2_EAstd = IAEA_C2_EAstd,
IAEA_C2_STstd = IAEA_C2_STstd,
IAEA_C2_Bothstd = IAEA_C2_Bothstd,
TIRI_I_NoStdCondition = TIRI_I_NoStdCondition,
TIRI_I_EAstd = TIRI_I_EAstd,
TIRI_I_STstd = TIRI_I_STstd,
TIRI_I_Bothstd = TIRI_I_Bothstd
)
C = 8
for(j in 1:C) {
#convert list Variable to string for later usage as Variable Name as unique identifier!!
SubNameString = names(ResidualList)[j]
SubNameString = paste0(SubNameString, "Residuals")
#print(SubNameString)
LoopVar = ResidualList[[j]]
LoopVar[ ,"F_corrected_normed"] = round(LoopVar[ ,"F_corrected_normed"] / mean(LoopVar[ ,"F_corrected_normed"]),
digit = 5
)
LoopVar[ ,"F_corrected_normed_error"] = round(LoopVar[ ,"F_corrected_normed_error"] / mean(LoopVar[ ,"F_corrected_normed_error"]),
digit = 5
)
assign(paste(SubNameString, j), LoopVar)
}
View(IAEA_C2_NoStdConditionResiduals1)
Not really a problem with assign and more with behavior of the paste function. This will build a variable name with a space in it:
assign(paste(SubNameString, j), LoopVar)
#simple example
> assign(paste("v", 1), "test")
> `v 1`
[1] "test"
,,,, so you need to get its value by putting backticks around its name so the space is not misinterpreted as a parse-able delimiter. See what happens when you type:
`IAEA_C2_NoStdCondition 1`
... and from here forward, use paste0 to avoid this problem.

Resources