I am very much new to the R GUI programming.
I wanted the user to dynamically select columns in the dataframe and then after that dynamically select the levels of the selected columns.
My intent is to allow users to select columns and the filter values and then get the dataframe filtered upon those. For getting the column names, I am getting correct values. However, while fetching the levels of the selected column, the for loop exits and the selected values are not getting captured in the cba and cbv variable.
items<-colnames(joined_final)
items<-levels(joined_final$State)
cbg<-gcheckboxgroup(items,cont=TRUE,use.table = TRUE, index=TRUE,container = w)
cb<-svalue(cbg,index=TRUE)
j<-length(cb)
func(joined_final,j,cb)
func<-function(joined_final,j,cb){
cbv=c()
for (i in seq(j)){
items_1<-levels(joined_final[,cb[i]])
cba<-gcheckboxgroup(items_1,cont=TRUE,use.table = TRUE, container = w)
cbv<-svalue(cba)
}
return(cbv)
}
Please help me with this. Thanks in advance
Related
First of all thank you for reading my post.
I would like to ask how can I replicate R subset mechanism in excel-vba?
Here is my r function:
Subdeck2 = deck2[(deck2[,3]>=10 & deck2[,4]<=30),]
The code uses r to create a data.frame object called Subdeck2 which is a subset of a data.frame object called deck2 that contain the rows of deck2 that have a third column value of more than or equal to ten, and a fourth column value of less than or equal to thirty.
I would like to replicate this in excel-vba, and a worksheet that is a subset of a the worksheet with the source data. I think the array naming in excel is very helpful to reference the rows and columns.
In r, it tends to get confusing when I have to do this repeatedly, because I have to remember the row and column numbers that I have already input.
I only need to do this one particular thing in excel. I already bought a book about vba programming but it's like 1000 pages long and I cant seem to find the word subset in there.
Any suggestions on how to do this or where i can learn to do this will be very appreciated. Thanks!
Here is an example - nowhere near as concise as your r function though.
The method is commented - but basically, it iterates the rows of the source range and checks each row for the criteria. Then it selects the output range and resizes it to the size of the filtered data before output.
Option Explicit
Sub FilterLikeRSubset()
Dim rngData As Range
Dim rngRow As Range
Dim rngFilter As Range
Dim rngOutput As Range
'get data
Set rngData = ThisWorkbook.Worksheets("Sheet1").Range("A1:D5")
'iterate rows in data
For Each rngRow In rngData.Rows
'test row criteria
If rngRow.Cells(1, 3) >= 10 And rngRow.Cells(1, 4) <= 30 Then
'success
If rngFilter Is Nothing Then
Set rngFilter = rngRow
Else
Set rngFilter = Union(rngFilter, rngRow)
End If
End If
Next rngRow
'set range for output
Set rngOutput = ThisWorkbook.Worksheets("Sheet1").Range("A10")
Set rngOutput = rngOutput.Resize(rngFilter.Rows.Count, rngFilter.Columns.Count)
'output
rngOutput.Value = rngFilter.Value
End Sub
Sample output:
I have a 'Agency_Reference' table containing column 'agency_lookup', with 200 entries of strings as below :
alpha
beta
gamma etc..
I have a dataframe 'TEST' with a million rows containing a 'Campaign' column with entries such as :
Alpha_xt2010
alpha_xt2014
Beta_xt2016 etc..
i want to loop through for each entry in reference table and find which string is present within each campaign column entries and create a new agency_identifier column variable in table.
my current code is as below and is slow to execute. Requesting guidance on how to optimize the same. I would like to learn how to do it in the data.table way
Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta'))
TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))
TEST$agency_identifier <- 0
for (agency_lookup in as.vector(Agency_Reference$agency_lookup)) {
TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}
Expected Output :
Campaign----Agency_identifier
alpha_xt123---alpha
ALPHA34----alpha
Beta_xyz_34----beta
BETa_testing----beta
code_delta_-----delta
Try
TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))
pattern = tolower(c('alpha','Beta','gamma','delta','zeta'))
TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'),
replacement = '\\1',
x = tolower(TEST$Campaign))
This will not answer your question per se, but from what I understand you want to dissect the Campaign column and do something with the values it provides.
Take a look at Tidy data, more specifically the part "Multiple variables stored in one column". I think you'll make some great progress using tidyr::separate. That way you don't have to use a for-loop.
I am trying to use a custom function inside 'ddply' in order to create a new variable (NormViability) in my data frame, based on values of a pre-existing variable (CelltiterGLO).
The function is meant to create a rescaled (%) value of 'CelltiterGLO' based on the mean 'CelltiterGLO' values at a specific sub-level of the variable 'Concentration_nM' (0.01).
So if the mean of 'CelltiterGLO' at 'Concentration_nM'==0.01 is set as 100, I want to rescale all other values of 'CelltiterGLO' over the levels of other variables ('CTSC', 'Time_h' and 'ExpType').
The normalization function is the following:
normalize.fun = function(CelltiterGLO) {
idx = Concentration_nM==0.01
jnk = mean(CelltiterGLO[idx], na.rm = T)
out = 100*(CelltiterGLO/jnk)
return(out)
}
and this is the code I try to apply to my dataframe:
library("plyr")
df.bis=ddply(df,
.(CTSC, Time_h, ExpType),
transform,
NormViability = normalize.fun(CelltiterGLO))
The code runs, but when I try to double check (aggregate or tapply) if the mean of 'NormViability' equals '100' at 'Concentration_nM'==0.01, I do not get 100, but different numbers. The fact is that, if I try to subset my df by the two levels of the variable 'ExpType', the code returns the correct numbers on each separated subset. I tried to make 'ExpType' either character or factor but I got similar results. 'ExpType has two levels/values which are "Combinations" and "DoseResponse", respectively. I can't figure out why the code is not working on the entire df, I wonder if this is due to the fact that the two levels of 'ExpType' do not contain the same number of levels for all the other variables, e.g. one of the levels of 'Time_h' is missing for the level "Combinations" of 'ExpType'.
Thanks very much for your help and I apologize in advance if the answer is already present in Stackoverflow and I was not able to find it.
Michele
I (the OP) found out that the function was missing one variable in the arguments, that was used in the statements. Simply adding the variable Concentration_nM to the custom function solved the problem.
THANKS
m.
For a marketing class I have to write a function that calculates the retention rate of the customers (probability that a customer still is a customer). I've come so far that I isolated the ids of the individual customers and stored them in the matrix first.transactions.data. I then split them into cohorts (group of customers by time) with split() and stored them in the list cohort.
Now comes my problem: I calculated another sub-matrix from the full data set called final.period.data where I will calculate the retention rate. However, therefore I have to isolate the ids in final.period.data for each cohort. My instructor told me that I should create an additional column in final.period.data that shows TRUE or FALSE depending on whether the cohort's id and final.period.data's id are the same. For this I tried to use exists, but I always receive error messages. I tried the following:
final.period.data <- if(exists(cohort$'1'$id, where = final.period.data$id) final.period.data$same = TRUE)
but always receive error messages such as: unexpected symbol or invalid first argument. I also tried to convert the list cohort into a matrix but this didn't help either. How do I have to change the exist command or is there a simpler way to locate cohort's ids in final.period.data?
Thank you for your help.
You can just create a function that does what you want:
funct <-(final.period.data){
if (final.period.data$cohort =='1' & final.period.data$id ==<condition2>){
#Change the number for the TRUE condition}
else{ #If it doesn't fit the two conditions
#Change the number for the FALSE condition}
}
vector <- c(nrow(final.period.data))
final.period.data <- cbind(vector)
And use it as the apply function. Here will you find more information about apply
But I usually do it with a for loop, first creating the new column and then adding it to the data frame.
I am using some R code that uses a data table class, instead of a data frame class.
How would I do the following operation in R without having to transform map.dt to a map.df?
map.dt = data.table(chr = c("chr1","chr1","chr1","chr2"), ref = c(1,0,3200,3641), pat = c(1,3020,3022, 3642), mat = c(1,0,3021,0))
parent = "mat"
chrom = "chr1"
map.df<-as.data.frame(map.dt);
parent.block.starts<-map.df[map.df$chr == chrom & map.df[,parent] > 0,parent];
Note: parent needs to be dynamically allocated, its an input from the user. In this example I chose "mat" but it could be any of the columns.
Note1: parent.block.starts should be a vector of integers.
Note2: map.dt is a data table where the column names are c("chr","ref","pat","mat").
The problem is that in data tables I cannot access a given column by name, or at least I couldn't figure out how.
Please let me know if you have some suggestions!
Thanks!
It's a little unclear what the end goal is here, especially without sample data, but if you want to access rows by character name there are two ways to do this:
Columns = c("A", "B")
# .. means "look up one level"
dt[,..Columns]
dt[,get("A")]
dt[,list(get("A"), get("B"))]
But if you find yourself needing to use this technique often, you're probably using data.table poorly.
EDIT
Based on your edit, this line will return the same result, without having to do any as.data.frame conversion:
> map.dt[chr==chrom & get(parent) > 0, get(parent)]