SQLite And/or statement not working - sqlite

When I use the following command below:
select *
from storm
where (variable = "TMP" OR
variable = "VVEL" OR
variable = "UGRD" OR
variable = "VGRD" OR
variable = "RH" OR
variable = "HGT") AND level >=150 AND level <=200
The variable part parses out what I need from there, but the level command seems to be completely ignored. All levels are displayed with this command, but I only want those between 150 and 200.
Any suggestions?
Thanks in advance!

try this:
select * from storm where variable IN ('TMP', 'VVEL', 'UGRD', 'VGRD', 'RH', 'HGT') AND level BETWEEN 150 AND 200

Related

for loop function in Seurat analysis in R

I have tried to get multiple outputs of the function I made
ratio_marker_out_2 = function(marker_gene, cluster_id){
marker_gene = list(row.names(FindMarkers(glioblastoma, ident.1 = cluster_id)))
for (gene in marker_gene){
all_cells_all_markers = glioblastoma#assays$RNA#counts[gene,]
selected_cells_all_marker = all_cells_all_markers[cluster_id!=Idents(glioblastoma)]
gene_count_out_cluster = glioblastoma#assays$RNA#counts[,cluster_id!=Idents(glioblastoma)]
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
}
return(ratio_out)
}
Here, the length of marker_gene is about hundreds. Let's say the length is 100. I want to get ratio_out of each gene in marker_gene. However, when running this function, I only get one output instead of a list of 100 ratio_out. Could please anyone helps how to fix it?
The output I got for
ratio_marker_out_2(marker_gene, 0)
is 1 0.5354895. Please see the pict below
It can be that sum built-in function.
By default, it returns a number. So when you do:
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
you're actually dividing two numerics.
So if you want to return a list, you must divide, depending on your calculations, just
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster)
I have solved this issue using
all_cells_all_markers[marker_gene, cluster_id!=Idents(glioblastoma)]
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster).

Why there are different behaviour between assignment symbol “=” and “<-” in R?

I'm assigning values within the output list of a function, like:
nofun = function(sth){
something happening here
metrics = list(
metric1 = value1
metric2 <- value2 )
return(metrics)
}
Once I query metrics, I noticed that the use of <- and = differs: the first ones only assigns the value to a variable with no name (i.e. "x1"= value1), while the second one applies also the correct name (i.e. metric1 = value1).
This behaviour is cited also for data.frame at the bottom of an old more generic question, but there is no explanation of this specific usage case.
It caused me quite many headaches and waste of time before noticing it, but I didn't find any other useful information.
thanks in advance!
To define a named list you have to use the syntax list(name1 = value1, name2 = value2, ...). Elements of the list defined in this way have an attribute name containing their name.
Writing name2 <- value2 assigns value2 to a variable name2. If you write this inside of a list definition (list(name2 <- variable2)) the variable is included in the list but no name attribute is defined. So it is equivalent to:
name2 <- variable2
list(name2)
You can compare both statements:
attributes(list(a=3))
# $names
# [1] "a"
attributes(list(a<-3))
# NULL

How do I read a column as a categorical in CSV.jl

I have tried many variations of the following:
f = CSV.File(file, delim="\t",
header=["C" * string(i) for i in 1:6],
types=Dict("C1"=>CategoricalArray))
In pandas I would use the string "category" to describe the datatype.
Alternatively, if I want to build a dataframe from scratch, can I say something like
df = DataFrame(Chromosome = CategoricalArray[], Start = Int64[], End = Int64[], Name = Int64[], Score = Int64[], Strand = CategoricalArray[])
I have tried it, but then I get the error:
ERROR: LoadError: ArgumentError: Error adding chr8 to column :Chromosome. Possible type mis-match.
Like Bogumil said, the best thing is probably to use the flag
CSV.read(..., categorical = true)
For columns with mostly unique data, this will add overhead, but is the best way to do it for now.

R: "missing value where TRUE/FALSE needed" but works with another similar dataset?

I have the following function "cOrder"
library(MASS)
cOrder=function(anm,sir,dam){
maxloop=1000
i = 1
count = 0
mam=length(anm)
old = rep(1,mam)
new = old
while(i>0){
for (j in 1:mam){
ks = sir[j]
kd = dam[j]
gen = new[j]+1
if(ks != "NA"){
js = match(ks,anm)
if(gen > new[js]){new[js] = gen} #where error occurs
}
if(kd != "NA"){
jd = match(kd,anm)
if(gen > new[jd]){new[jd] = gen}
}
} # for loop
changes = sum(new - old)
old = new
i = changes
count = count + 1
if(count > maxloop){i=0}
} # while loop
return(new)
} # function loop
which works brilliantly when imputting the following
dataset:
animal=c("bf","dd","ga","ec","fb","ag","he")
sire=c("dd","ga","NA","ga","NA","bf","dd")
dams=c("he","ec","NA","fb","NA","ec","fb")
gg=cOrder(animal,sire,dams)
but crashes and burns with the following:
animal=c("67947887","67947986","67948372","67948877","67948927","67949057","67950873","67951186","67951285","67951384","67951400","67951525","67951681","68045244","68045657","69999837","77542587","77542629","78468170","79879946")
sire=c("45334307","45334307","40684433","38121933","38141933","40684433","43339787","38431722","40684433","43339787","34931873","40684433","34931873","67951525","67951525","67950873","67951400","67951384","NA","67951681")
dams=c("37084407","25565110","36817369","21897145","21897145","20138814","32629901","37485356","25731548","32129629","31795768","37588084","36812355","68040013","68040500","68040443","67951855","67950980","67949065","67948307")
gg=cOrder(animal,sire,dams)
>Error in if (gen > new[js]) { : missing value where TRUE/FALSE needed
Both of these are inputted as character vectors, so I don't think it is a matter of whether the one set have characters and the other numeric digits. Or could it? Have also tried to make them numeric, import from a .csv, unlist them, etc. Error code stays the same.
My individual names generally consist of 8-digit numeric codes, any suggestions towards preventing this error, or renaming my whole population?
Thanks!
EDIT
The way the datasets are setup is as follows: the first animal in the vector is the offspring of the first dam and sire in their respective vectors. Thus, according the the simple set, bf is the offspring of dd and he, dd of ga and ec, and the parents of ga are unknown.
The idea behind this function is to determine the "oldest" animal/s in the dataset, i.e., the ones with the least number of generations, and eventually in succeeding code order them accordingly and generate a relationship matrix. So it is supposed to be OK if an animal does not appear in the sire list; it means that it is an older animal. So the code is supposed to move on to the next. Which it does in the simple set, but not in the proper one. Any ideas?
Thanks!
It is because your first sire value (45334307) doesn't match anything in your animal list, so match() returns an NA.

Custom function does not work in R 'ddply' function

I am trying to use a custom function inside 'ddply' in order to create a new variable (NormViability) in my data frame, based on values of a pre-existing variable (CelltiterGLO).
The function is meant to create a rescaled (%) value of 'CelltiterGLO' based on the mean 'CelltiterGLO' values at a specific sub-level of the variable 'Concentration_nM' (0.01).
So if the mean of 'CelltiterGLO' at 'Concentration_nM'==0.01 is set as 100, I want to rescale all other values of 'CelltiterGLO' over the levels of other variables ('CTSC', 'Time_h' and 'ExpType').
The normalization function is the following:
normalize.fun = function(CelltiterGLO) {
idx = Concentration_nM==0.01
jnk = mean(CelltiterGLO[idx], na.rm = T)
out = 100*(CelltiterGLO/jnk)
return(out)
}
and this is the code I try to apply to my dataframe:
library("plyr")
df.bis=ddply(df,
.(CTSC, Time_h, ExpType),
transform,
NormViability = normalize.fun(CelltiterGLO))
The code runs, but when I try to double check (aggregate or tapply) if the mean of 'NormViability' equals '100' at 'Concentration_nM'==0.01, I do not get 100, but different numbers. The fact is that, if I try to subset my df by the two levels of the variable 'ExpType', the code returns the correct numbers on each separated subset. I tried to make 'ExpType' either character or factor but I got similar results. 'ExpType has two levels/values which are "Combinations" and "DoseResponse", respectively. I can't figure out why the code is not working on the entire df, I wonder if this is due to the fact that the two levels of 'ExpType' do not contain the same number of levels for all the other variables, e.g. one of the levels of 'Time_h' is missing for the level "Combinations" of 'ExpType'.
Thanks very much for your help and I apologize in advance if the answer is already present in Stackoverflow and I was not able to find it.
Michele
I (the OP) found out that the function was missing one variable in the arguments, that was used in the statements. Simply adding the variable Concentration_nM to the custom function solved the problem.
THANKS
m.

Resources