colnames intgroup arguement of arrayQualityMetrics package of Biobase - r

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.

It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Related

Selecting and renaming columns in SpatialPointsDataFrame

I'm working with a feature class dataset extracted from a geodatabase, which I've filtered to my area of interest and intersected with a SpatialPointsDataFrame. In order to export it to a shapefile with WriteOGR I need to format the attribute names and I also want to only select specific columns to export in my final shapefile. I have been running into a lot of errors using standard select or base R subletting techniques. For some reason R doesn't seem to recognize the column names when I try to select. I've tried lots of different methods and can't figure out where I'm going wrong.
```bfcln%>%
+ select(STATEFP,DP2_HC03_V, DP2_HC03V.1)
Error in tolower(use) : object 'STATEFP' not found```
# create a spatial join between bf_pop and or_acs
#check CRS
```crsbf <- bf_pop#proj4string```
# change acs CRS to match bf_pop
```oracs_reprj <- spTransform(or_acs, crsbf)```
# join by spatial attributes
```bf_int <- raster::intersect(bf_pop, oracs_reprj)```
#truncate field names to 10 characters for ESRI formatting
```names(bf_int) <- strtrim(names(bf_int),10)```
#remove duplicates from attribute table
```bfcln <- bf_int[which(!duplicated(bf_int$id)), ]```
After failing with the select() method multiple times, I tried renaming columns.
# rename variables of interest
```bfcln1 <-bfcln%>%
select(DP2_HC03_V)%>%
rename(DP2_HC03_V=pcntunmar)%>%
select(DP2_HC03_V.1)%>%
rename(DP2_HC03_V.1=pcntirsh)
Error in tolower(use) : object 'DP2_HC03_V' not found```
To rename spatial files you'll need to install the package spdplyr.
Similarly to dplyr, you'd do:
df <- df %>%
rename(newName = oldName)

CSV imported data table is not possible to use for histogram plot

I have created my own data set named as Kwality.csv in Excel and when I am executing above code I am not able to get histogram for the same data and it's throwing me error like this:
Error in hist.default(mydata) : 'x' must be numeric
library(data.table)
mydata = fread("Kwality.csv", header = FALSE)
View(mydata)
hist(mydata)
I tried to reproduce you work flow and exported xlsx-file into csv-file (using export to comma-separated file).
First, you should check what kind of character is used for variable and decimal places separation. In my case, for variable separation it is the ; semicolon, and the decimal places is "," comma.
Then you should choose the column, which you will use for the histogramm plot with the function[[]]. The data table itself is not a valid argument for hist function. Please see as below.
See below:
Taken this into consideration you cod execute your code:
library(data.table)
# load csv generatd by NORMSINV(RAND()) in Excel
mydata = fread("check.csv",header = FALSE, sep = ";", dec = ",")
mydata
#hist(mydata)
# Error in hist.default(mydata) : 'x' should be numeric
# does not work
# access by column, e.g. third colum - OK
hist(mydata[[3]])
Output:

Autocompletion error during the data frame column selection in RStudio

I used the readxl package to import from the Excel file into RStudio. Now I'm trying to access a column in that dataset using the $ operator. However, I keep getting the notification:
(Type Error): null is not an object (evaluating a.length)
Even though I've performed this type of operation many times before without issue...
The error I'm getting:
The dataset in the Global Environment pane:
The root of the problem is located in NA used as a column name. The error is thrown since RStudio autocompletion is not able to extract column names.
Please see the reproduction of the problem:
df <- data.frame(a = 1:3, b = 1:3)
names(df)[2] <- NA
If you will try to typedf$a the error below will be generated.
To avoid this kind of situation you should assign data.frame column names explicitly. You have to options:
assign names(df) <- c("a", "b");
delete the spacer columns from the source Excel file to avoid NA use as the column names.

eqmcc function in R QCA package exiting with error

When I attempt to call eqmcc() against a truthTable object, the result is this error message:
Error: The outcome's length should be the same as the number of rows in the data.
Here's my script:
library(QCA); library (psych); library(readr)
gamson <- read_csv("/path/to/Gamson.csv", col_names = TRUE)
is.na(gamson)
ttACP2 <- truthTable(data=gamson, outcome = "ACP", conditions = "BUR, LOW, DIS, HLP", n.cut=3, incl.cut=0.750, sort.by="incl, n", complete=FALSE, show.cases=TRUE)
ttACP2
csACP2 <- eqmcc(ttACP2, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
The is.na() function shows that there are no missing values in my data set. The data set contains 54 rows, of which the first is the column names. The truth table is generated according to expectations. But the minimization of the selected causal conditions fails.
I found a chunk of source code that matches the error message on line 90 here:
https://github.com/cran/QCApro/blob/master/R/pof.R
But I'm not competent enough in programming to understand what conditions lead to the error message being thrown.
This is because your dataset is a tibble instead of a dataframe. After loading the dataset, and before finding the truth table, do this:
gamson <- as.data.frame(gamson)
It should work after that. (The latest version of the eqmcc function is called minimize now.

cannot handle matrix/array columns with write.dbf

hope i get everything together for this problem. first time for me and it's a little bit tricky to describe.
I want to add some attributes to a dbf file and save it afterwards for use in qgis. its about elections and the data are the votes from the 11 parties in absolute and relative values. I use the shapefiles package for this, but also tried it simply with foreign.
my system: RStudio 0.97.311, R 2.15.2, shapefile 0.7, foreign 0.8-52, ubuntu 12.04
try #1 => no problems
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
try #2 => "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDataDistricts <- cbind(shpDataDistricts, votesDistrict[, 2]) # add a new column
names(shpDataDistricts)[5] <- "SPOE"
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
the write function returns "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
so by simply adding a column (integer) to the data.frame, the write.dbf function isn't able to write out anymore. am now debugging for 3 hours on this simple issue. tried it with shapefiles package via opening shapefile and dbf file, all the time the same problem.
When i use the foreign package directly (read.dbf).
if i save the dbf-file without the voting data (only with the small adapations from step 1+2), it's no problem. It must have to do with the merge with the voting data.
I got the same error message ("error in get("write.dbf"...) while working with shapefiles in R using rgdal. I added a column to the shapefile, then tried to save the output and got the error. I was added the column to the shapefile as a dataframe, when I converted it to a factor via as.factor() the error went away.
shapefile$column <- as.factor(additional.column)
writePolyShape(shapefile, filename)
The problem is that write.dbf cannot write a dataframe into an attribute table. So I try to changed it to character data.
My initial wrong code was:
d1<-data.frame(as.character(data1))
colnames(d1)<-c("county") #using rbind should give them same column name
d2<-data.frame(as.character(data2))
colnames(d2)<-c("county")
county<-rbind(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "PANY_animals_84.dbf") **##doesn't work**
##Error in write.dbf(dataname, ".bdf")cannot handle matrix/array columns
Then I changed everything to character, it works! right code is:
d1<-as.character(data1)
d2<-as.character(data2)
county<-c(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "filename")
Hope it helps!

Resources