transformation of flow cytometry data using flowCore? - r

I wanna know if someone know how to do transformation of the channel four (FLH 4) without using the standard transformations offer by the flowCore package?
The values of the channel four are between 1 and 4096 and i need to convert in values between 1 and 246 with the rule 10^(x/1024).
Thank you.

Better to use flowTrans mclMultivArcSinh transformation.
trans<-flowTrans(flowData, "mclMultivArcSinh",colnames(flowData)[3:12], n2f=FALSE, parameters.only=FALSE)
You must not tranform FSC-A,SSC-A and Time thats why i have colnames [3:12].

you could get a custom transform in by doing something like..
plot(transform(someFlowFrame, FSC-H=10^(FSC-H/1024), SSC-H=10^(SSC-H/1024)), c("FSC-H","SSC-H"))
however as 10^(4096/1024) returns a max value of 10000 for your hypothetical example, the plot with your ranges -
plot(transform(someFlowFrame, FSC-H=10^(FSC-H/1024), SSC-H=10^(SSC-H/1024)), c("FSC-H","SSC-H"), xlim=c(0,256), ylim=c(0,256))
doesn't look good.

Related

SPSS: correlating two vectors

I have two vectors in my dataset Vs = s1 to s10 and Vt= t1 to t10.
They describe two pictures and I want to know for each case what the correlation is.
However there is no such a function Cor(Vs, Vt) because Vectors are apparently not usable in the standard functions. There is even no mean(Vs)!
I tried to write syntax but failed also because the problem of missing variables (implementing pairwise deletion seems complex).
Any hint is welcome.
Is it possible to ask a question that is only seen by SPSS experts?
calculating the correlation in the present structure is probably feasible but would be pretty complex. I suggest restructuring the data, then all becomes easy:
The code assumes you have some line ID in the data, called lineNum.
If you don't, you'll need to create one using the first line.
compute lineNum=$casenum. /* this is only necessary if you don't have some other line ID.
varstocases /mame V_s from S1 to S10 /make V_t from V1 to V10 /index=pairNum(V_s).
sort cases by lineNum.
split file by lineNum.
correlations V_s with V_t. /* you can edit the code here to add features to the analysis.
split file off.
That's it. Now the results will appear in the output window - one correlation for each of the original lines. If you need to import the correlations back to the original data you can do that by using OMS control to capture the results into a new dataset and then matching it back to the original file.

Expand a netCDF Variable into an additional Dimension or multiple Variables

I am working with a very large netCDF file in three dimensions (lat/lon/time). The resolution is 300 meters and the time variable has 25 steps, which leads to 64800x129600x25 cells.
The one variable contained in the file is an integer (ranging from -36 to 120) but represents an underlying factor, which is the problem.
It is a land cover data set, so for example: -20 means the cell is of the land type Forest or 10 means the cell is covered by water.
I want to reshape the netCDF file such that there is an additional dimension which represents every factor level of the original variable. And the variable would then be just a 1 or 0 per cell indicating the presence of every factor level at a certain lat/lon/time.
The dimensions would then be lat/lon/time/land type.
Here is an example data set, that does not concern land type but is small enough that it can be used for testing. And here is some code to read it in:
library(ncdf4)
# Download the data
download.file("http://schubert.atmos.colostate.edu/~cslocum/code/air.sig995.2012.nc",
mode="wb", destfile = "test.nc")
test.ncdf <- nc_open("test.nc", write=TRUE)
# See the lon,lat,time dimensions
print(test.ncdf)
tmp.array <- ncvar_get(test.ncdf, varid="air")
I'm not sure if the raster package is better more suited for this task. For very small netCDF-files I have managed the intended result to some extent, by extracting the data and then stacking it as a data.frame.
Any help or pointing in the right direction would be greatly appreciated.
Thanks in advance.
If I understand correctly, you want to have a set of fields for each type that are 1 or 0 as a function of lat/long/time. e.g. if you are looking a forest you want an array which is 1 when the factor=20 and 0 otherwise.
I know you want to do this in a 4 dimensional array, for that you will need to use R I expect as you tagged the question. But if you don't mind to have a series of 3D arrays for types, a quick and easy way to do this is to use CDO to process the integer array
cdo eqc,-20 air.sig995.2012.nc test.nc
The issue with this is that the output variable still has the same name
(you don't say what it is called, so I refer to it as sfctype), and so you would need to change the meta data with nco.
Therefore a better way would be to use expr in cdo.
cdo expr,"forest=sfctype==-20" air.sig995.2012.nc forest.nc
This makes a new variable called forest which is 1 or 0.
You could now process all the types you want, and then merge them into one file:
cdo expr,"forest=(sfctype==-20)" air.sig995.2012.nc type_forest.nc
cdo expr,"forest=(sfctype==10)" air.sig995.2012.nc type_water.nc
...etc...
cdo merge type_*.nc combined_file.nc
(I don't think you need the curly brackets, but it is a clearer syntax)
...almost what you wanted in a few lines, but not quite... I am not sure how to "stack" these new variables into a 4D array if you really need that, but perhaps nco can do it.

how to use for loop to calculate the same formula among different data sets in R?

I have 64 different data sets: data 1, data 2, data 3... data 64.
I need to calculate the "DTW" distance between each data set and finally get the distance matrix, the code I am using is :
zooData <- zoo(data1$length, data1$Time.Elapsed)
zooData2<-zoo(data2$length,data2$Time.Elapsed)
alignment<-dtw(zooData,zooData2)
alignment$normalizedDistance
In this way, I have to manually change the data set name one by one. The process is super tedious. I am thinking I can use "for loop" to solve this problem, maybe I can put 64 different data sets into a list, and using "for loop"? I am not sure how can I achieve this goal in R. Anyone can help me? Thank you very much!

Is there a way to read a raw netcdf file and tell what layer a value belongs to?

I'm in the process of evaluating how successful a script I wrote is and kind of a quick and dirty method I've employed is looking at the first few values and last few values of a single variable and doing a few calculations with them based on the same values in another netcdf file.
I know that there are better ways to approach this but again, this is a really quick and dirty method that has worked for me so far. My question though is that by looking at the raw data through ncdump, is there a way to tell which vertical layer that data belongs to? In my example, the file has 14 layers. I"m assuming that the first few values are a part of the surface layer and the last few values are a part of the top layer, but I suspect that this assumption is wrong, at least in part.
As a follow-up question, what would then be the easiest 'proper' way to tell what layer data belongs to? Thank you in advance!
ncview and NCO are both very powerful and quick command line operators to view data inside a netcdf file.
ncview: http://meteora.ucsd.edu/~pierce/ncview_home_page.html
NCO: http://nco.sourceforge.net/
You can easily show variables over all layers for example with
ncks -d layer,0,13 some_infile.nc
ncdump dumps the data with the last dimension varying fastest (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/CDL-Syntax.html) so if 'layer' is the slowest/first dimension, the earlier values are all in the first layer, while the last few values are in the last layer.
As to whether the first layer is the top or bottom layer, you'd have to look to the 'layer' dimension and its data.

Selecting first n observations of each group

I and my coworkers enter data in turns. One day I do, the next week someone else does and we always enter 50 observations at a time (into an Excel sheet). So I can be pretty sure that I entered the cases from 101 to 150, and 301 to 350. We then read the data into R to work with it. How can I select only the cases I entered?
Now I know that I can do that by copying from the excel sheet, however, I wonder if it is doable in R?
I checked several documents about subsetting data with R, also tried things like
data<-data[101:150 & 301:350,]
but didn't work. I appreciate if someone would guide me to a more comprehensive guide answering this question.
The answer to the specific example you gave is
data[c(100:150,300:350),]
Can you be more specific about which cases you want? Is it the first 50 of each 100, or the first 50 of each 300, or ... ? To get the indices for the first n of each m cases you could use something like
c(outer(0:4,seq(1,100,by=10),"+"))
(here n=5, m=10); outer is a generalized outer product. An alternate (and possibly more intuitive) solution would use rep, e.g.
rep(0:4,10) + rep(seq(1,100,by=10),each=5)
Because R automatically recycles vectors where necessary you could actually shorten this to:
0:4 + rep(seq(1,100,by=10),each=5)
but I would recommend the slightly longer formulation as more understandable.

Resources