Error in creating a dummy variable for plink GWAS? - dummy-variable

I'm currently extracting phenotype/covariate data I need to run a GWAS. I am also trying to convert the 'bank_centre location' to a dummy variable. In attempting to create the data file to run the GWAS, I keep getting an error for an extra "]" in my code relating to the dummy variable, however I don't see how there is an extra "]" in my code.
this is the first part of my input:
source scriptSettings.sh
Rscript - <<'END'
library(data.table)
d<-fread("phenos.txt")
message("create array dummy")
d[,array01:=as.numeric(as.numeric(f.22000.0.0)>1)]
message ("create bank_centre dummy")
d[,'bank1']<-0
d[,'bank2']<-0
d[,'bank3']<-0
d[which(d[,'f.54.2.0']==11025,'bank1']<-1
d[which(d[,'f.54.2.0']==11026,'bank2']<-1
d[which(d[,'f.54.2.0']==11027,'bank3']<-1
The output I get so far is:
create array dummy
create biobank_centre dummy
Error: unexpected ']' in "d[which(d[,'f.54.2.0']==11025,'bank1']"
Execution halted
Apologies if I am missing something super obvious - any help is much appreciated

I think there is a missing ) for which.
i.e. You may want to try something like this: d[which(d[,'f.54.2.0']==11025),'bank1']<-1

Related

Xcos throws "Undefined variable: scifunc_block_m" message in console

When I run a Xcos model containing a scifunc_block_m block like shown below
I get an error message relating to data dimensions inconsistency:
"Data dimensions are inconsistent:"
" Variable size=[1,1]"
"Block output size=[100,1]."
But when I double click in the block in order to see what can I change to make the dimensions correct I get a message in the console saying
Undefined variable: scifunc_block_m
What bugs me is that scifunc_block_m is not the name of any variable, but rather the name of the block itself like can be seen in the official docs.
Of course I double checked that nowhere in my function phase_shifter neither anywhere else I have any variable named like that.
I tried with Scilab 6.1.1 and 6.1.0 believing that it might be a bug from apparently not.
In your phase_shifter.sce file generating the input variable,
the signalIn variable does not comply with the From Workspace block requirements, whose documentation says that the input variable
must be a structure with time and values fields
.time must be a column vector, and in your case
.values must also be a column
So,
t = (0:1/fs:Npp/fs - 1/fs); // time vector
signalIn = A*%e^(%i*w*t);
should be replaced with
t = (0:1/fs:Npp/fs - 1/fs)'; // time column vector
signalIn = struct("time",t, "values",A*%e^(%i*w*t));
This fixes the inconsistent dimensions message.
In addition, i am not able to reproduce your issue about Undefined variable: scifunc_block_m. The parameters interface opens as expected.
You may get this kind of messages if you try to run some xcos parts out of xcos, without beforehand loading xcos-related libraries.
Then, we get an unclear "Output should be of complex type." message on the From workspace block.
By the way, you try to plot some complex values. Please have a look to the MATMAGPHI block before entering MUX: https://help.scilab.org/docs/6.1.1/en_US/MATMAGPHI.html

Is there a way to check where R is 'stuck' within a for loop? (R)

I am using system() to run several files iteratively through a program via CMD. It deposits each outputs into a sub-directory designated for specifically and only that input file. So # of inputs is exactly equal to the number of output directories/outputs.
My code works for the first iteration, but I can see in the console that it won't move on to the second file after completing the first. The stop sign remains active so I know R is still 'running', but since the for loop environment is unique I can't really tell what it's stuck on. It just stays like this for hours. Therefore I'm not sure how to begin to diagnose the issue I'm having. Is there a way of tracing what happened after cancelling the code, for example?
If your curious, the code looks like this btw. I don't know how to make it reproducible, so I just commented each line:
for (i in 1:length(flist)) {
##flist is a vector of character strings. Each
row of characters is both the name of the input file and the name of the
output directory
setwd(paste0(solutions_dir, "\\", flist[i]))
#sets the appropriate dir
system(paste0(program_dir,"\\program.exe I=",
file_dir, "\\", flist[i], " O=",solutions_dir, "\\", flist[i],
"\\solv"))
##line that inputs program's exe file and the appropriate input/output
locations
}

Shortening .write commands

I am learning from the book Learn Python The Hard Way 3.6, by Zed Shaw
There are a series of 6 target.write commands towards the bottom of the script and he wants me to simplify them into a single target.write command using strings formats and escapes. However, I am stuck.
Here is the original code:
from sys import argv
script, filename = argv
print(f"We're going to erase {filename}")
print("If you don't want that, hit CTRL-C (^C).")
print("If you do want that, hit RETURN.")
input("?")
print("Opening the file...")
target = open(filename,'w')
print("Truncating the file. Goodbye!")
target.truncate()
print("Now I'm going to ask you for three lines")
line1 = input("line 1:")
line2 = input("line 2:")
line3 = input("line 3:")
print("Im going to write these to the file.")
target.write(line1)
target.write("\n")
target.write(line2)
target.write("\n")
target.write(line3)
target.write("\n")
print("And finnaly, we close it")
target.close()
So far I have tried
target.write(line1),(line2),(line3)
but this gives a logical error of only writing to one line not all three.
target.write(line1) + (line2) + (line3)
with this one I get error
'unsupported operand types for +: 'int' + 'str'
target.write(line1),\n,(line2)\n(line3),\n
with this one I get error:
unexpected character after line continuation character
(<string>,line 22)
I have been googling and searching here for answers but have not found anything. One person posted a very similar question except for Zed's 2.7 book. However I am reading Zed's 3.6 book so the answers were no help to me unfortunately.
I'm not sure what you have and haven't covered so far in the book as I'm not familiar with it but one way to do what you want is to format the string first and then pass it to the write method like this:
target.write("{0}\n{1}\n{2}\n".format(line1, line2, line3))

Undefined columns selected error in R

I apologize in advance because I'm extremely new to coding and was thrust into it just a few days ago by my boss for a project.
My data set is called s1. S1 has 123 variables and 4 of them have some form of "QISSUE" in their name. I want to take these four variables and duplicate them all, adding "Rec" to the end of each one (That way I can freely play with the new variables, while still maintaining the actual ones).
Running this line of code keeps giving me an error:
b<- llply(s1[,
str_c(names(s1)
[str_detect(names(s1), fixed("QISSUE"))],
"Rec")],table)
The error is as such:
Error in `[.data.frame`(s1, , str_c(names(s1)[str_detect(names(s1), fixed("QISSUE")) & :
undefined columns selected
Thank you!
Use this to get the subset. Of course there is other ways to do that with simpler code
b<- llply(s1[,
names(s1)[str_detect(names(s1), fixed("QISSUE"))]
],c)
nwnam=str_c(names(s1)[str_detect(names(s1), fixed("QISSUE"))],"Rec")
ndf=data.frame(do.call(cbind,b));colnames(ndf)=nwnam
ndf
# of course you can do
cbind(s1,ndf)

glm probit "cannot find valid starting values" error message

I am trying to use R for the first time to do some probit analaysis.
I get the following error:
Error in if (!(validmu(mu) && valideta(eta))) stop("cannot find valid starting values: please specify some", :
missing value where TRUE/FALSE needed
This is the command I am using:
m1=glm(Good~Stg.Days+Dev.Deployments+Check.Ins+NoOfDevelopers,family=poisson(link = "probit"),data=deploy[1:4,])
My data deploy[1:4,] are as loaded in from a CSV file follows:
Good,Application Type,Project,Start Date,End Date,Stg Days,Dev Deployments,Check Ins,NoOfDevelopers
1,DocumentPlatform,ZCP,11/08/2010,11/11/2010,0.6,0,12,4
1,DocumentPlatform,ZCP,11/11/2010,09/12/2010,0.4,0,4,1
0,DocumentPlatform,ZCP,09/12/2010,07/03/2011,10,0,7,3
1,FactsheetPlatform,Process.ARCH,28/06/2010,09/03/2011,7.1,0,18,2
deploy is in reality a much bigger vector than 1:4 I am just using a subset of the data to help determine the problem.
Any ideas what is wrong?
As i commented: Using ?glm I found tha the poisoon family supports the following links: log, identity, and sqrt.
Testing on another link:
test <- data.frame('Good'=c(1,1,0,1),'Stg Days'=c(0.6,0.4,10,7.1),'Dev Deployments'=c(0,0,0,0),'Check Ins'=c(12,4,7,18),'NoOfDevelopers'=c(4,1,3,2))
m1=glm(Good~ . ,family=poisson(link = "log"),data=test)
Gives no errors. So I think your link = "probit" is the problem.

Resources