I have tried to get multiple outputs of the function I made
ratio_marker_out_2 = function(marker_gene, cluster_id){
marker_gene = list(row.names(FindMarkers(glioblastoma, ident.1 = cluster_id)))
for (gene in marker_gene){
all_cells_all_markers = glioblastoma#assays$RNA#counts[gene,]
selected_cells_all_marker = all_cells_all_markers[cluster_id!=Idents(glioblastoma)]
gene_count_out_cluster = glioblastoma#assays$RNA#counts[,cluster_id!=Idents(glioblastoma)]
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
}
return(ratio_out)
}
Here, the length of marker_gene is about hundreds. Let's say the length is 100. I want to get ratio_out of each gene in marker_gene. However, when running this function, I only get one output instead of a list of 100 ratio_out. Could please anyone helps how to fix it?
The output I got for
ratio_marker_out_2(marker_gene, 0)
is 1 0.5354895. Please see the pict below
It can be that sum built-in function.
By default, it returns a number. So when you do:
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
you're actually dividing two numerics.
So if you want to return a list, you must divide, depending on your calculations, just
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster)
I have solved this issue using
all_cells_all_markers[marker_gene, cluster_id!=Idents(glioblastoma)]
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster).
Related
I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.
I am trying to extract values from a vector to generate random numbers from a GEV distribution. I keep getting an error. This is my code
x=rand(Truncated(Poisson(2),0,10),10)
t=[]
for i in 1:10 append!(t, maximum(rand(GeneralizedExtremeValue(2,4,3, x[i])))
I am new to this program and I think I am not passing the variable x properly. Any help will be appreciated. Thanks
If I am correctly understanding what you are trying to do, you might want something more like
x = rand(Truncated(Poisson(2),0,10),10)
t = Float64[]
for i in 1:10
append!(t, max(rand(GeneralizedExtremeValue(2,4,3)), x[i]))
end
Among other things, you were missing a paren, and probably want max instead of maximum here.
Also, while it would technically work, t = [] creates an empty array of type Any, which tends to be very inefficient, so you can avoid that by just telling Julia what type you want that array to hold with e.g. t = Float64[].
Finally, since you already know t only needs to hold ten results, you can make this again more efficient by pre-allocating t
x = rand(Truncated(Poisson(2),0,10),10)
t = Array{Float64}(undef,10)
for i in 1:10
t[i] = max(rand(GeneralizedExtremeValue(2,4,3)), x[i])
end
I am trying to rename the columns of a time series using assign function as follows -
assign(colnames(paste0(<logic_to_get_dataset>)),
c(<logic_to_get_column_names>))
I am getting a warning : In assign(colnames(get(paste0("xvars_", TopVars[j, 1], "_lag", :
only the first element is used as variable name
also, the column name assignment does not happen. I think this is happening because of colnames() function. Is there a workaround ?
The issue is that assign only looks at the first element of the vector.
You can try this, for example:
df = data.frame(x = 1:3, y = 4:2)
within(df, assign(colnames(df),c('a','b'))
You'll notice that R only looks at the first variable, and it tries to reassign the values that are described by those column names to the second value. This behavior is obviously not what you're looking for.
Unfortunately, it's kind of hackey, but you can always use something like this
data.frame.name = get_df()#some function that returns text
data.frame.columns = get_cols()#some function that returns text
eval(parse(text = paste0('colnames(',data.frame.name,') = c(',
paste(data.frame.columns,collapse = ','),')')))
I prefer to avoid doing these kinds of expressions, but it should work as intended.
Here it goes -
temp_var <- paste0('colnames(var_',TopLines[j,1],'_lag',get(paste0('uniqLg_',TopLines[j,1]))[k,],'_',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12 ,
') <- c(gsub( "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'" , "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'__',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12,
'", colnames(var_',TopLines[j,1],'_xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],')))')
print(temp_var )
eval(parse( text=temp_var ))
where TopLines is a data frame with one column and contains a list of lines. The only problem with this method is, I can't test the output of eval unless I actually open the dataset and see if the changes have been affected.
I have the following function "cOrder"
library(MASS)
cOrder=function(anm,sir,dam){
maxloop=1000
i = 1
count = 0
mam=length(anm)
old = rep(1,mam)
new = old
while(i>0){
for (j in 1:mam){
ks = sir[j]
kd = dam[j]
gen = new[j]+1
if(ks != "NA"){
js = match(ks,anm)
if(gen > new[js]){new[js] = gen} #where error occurs
}
if(kd != "NA"){
jd = match(kd,anm)
if(gen > new[jd]){new[jd] = gen}
}
} # for loop
changes = sum(new - old)
old = new
i = changes
count = count + 1
if(count > maxloop){i=0}
} # while loop
return(new)
} # function loop
which works brilliantly when imputting the following
dataset:
animal=c("bf","dd","ga","ec","fb","ag","he")
sire=c("dd","ga","NA","ga","NA","bf","dd")
dams=c("he","ec","NA","fb","NA","ec","fb")
gg=cOrder(animal,sire,dams)
but crashes and burns with the following:
animal=c("67947887","67947986","67948372","67948877","67948927","67949057","67950873","67951186","67951285","67951384","67951400","67951525","67951681","68045244","68045657","69999837","77542587","77542629","78468170","79879946")
sire=c("45334307","45334307","40684433","38121933","38141933","40684433","43339787","38431722","40684433","43339787","34931873","40684433","34931873","67951525","67951525","67950873","67951400","67951384","NA","67951681")
dams=c("37084407","25565110","36817369","21897145","21897145","20138814","32629901","37485356","25731548","32129629","31795768","37588084","36812355","68040013","68040500","68040443","67951855","67950980","67949065","67948307")
gg=cOrder(animal,sire,dams)
>Error in if (gen > new[js]) { : missing value where TRUE/FALSE needed
Both of these are inputted as character vectors, so I don't think it is a matter of whether the one set have characters and the other numeric digits. Or could it? Have also tried to make them numeric, import from a .csv, unlist them, etc. Error code stays the same.
My individual names generally consist of 8-digit numeric codes, any suggestions towards preventing this error, or renaming my whole population?
Thanks!
EDIT
The way the datasets are setup is as follows: the first animal in the vector is the offspring of the first dam and sire in their respective vectors. Thus, according the the simple set, bf is the offspring of dd and he, dd of ga and ec, and the parents of ga are unknown.
The idea behind this function is to determine the "oldest" animal/s in the dataset, i.e., the ones with the least number of generations, and eventually in succeeding code order them accordingly and generate a relationship matrix. So it is supposed to be OK if an animal does not appear in the sire list; it means that it is an older animal. So the code is supposed to move on to the next. Which it does in the simple set, but not in the proper one. Any ideas?
Thanks!
It is because your first sire value (45334307) doesn't match anything in your animal list, so match() returns an NA.
I have a list of samples, each of varying lengths. I need to compare sample means (using a Mann-Whitney-Wilcoxon test) for all samples in the list. Current code is as follows:
wilcox.v = list() ##This creates the list of samples
for (i in df){
treat = list(i$treatment)
wilcox.v = c(wilcox.v,treat)
}
###This *should* iterate over all items in the list
wilcox = sapply(wilcox.v, function(i){ wilcox.test(as.numeric(wilcox.v[i,]), as.numeric(wilcox.v[-i,]), exact = FALSE)$p.value
})
I'd like to have the function return a vector of p-values, so that the broader function can re-sample if necessary.
The problem seems to lie in the need to compare a sample mean to all other sample means in the list.
I'm sure there's an easy way to do this (and I think it has something to do with calling indicies correctly), but I'm not sure!
AS joran said, you wrote your apply function a little wonky. There are two ways you can fis this.
Modify it so i is in fact an index reference:
wilcox = sapply(1:length(wilcox.v)
,function(i){ wilcox.test(as.numeric(wilcox.v[[i]])
,as.numeric(wilcox.v[[-i]]), exact = FALSE)$p.value
})
modify your function so it appropriately treats i as a list element. I'll leave this as an exercise to you (primarily since I don't want to deal with the wilcox.v[-i,] term.
Thanks for your help! This is the solution I ended up using. It's hardly elegant but it gets the job done.
mannwhit = vector()
for (i in mannwhit.v){
for (j in mannwhit.v){
if (identical(i,j) == FALSE){
p.val = wilcox.test(i, j, paired=FALSE)$p.value
mannwhit = c(mannwhit, p.val)
}
}
}