Retrieve synonyms of words using wordnet for R - r

I'm currently working with wordnet in R (I'm using RStudio for Windows (64bit)) and created a data.frame containing synset_offset, ss_type and word from the data.x files (where x is noun, adj, etc) of the wordnet database.
A sample can be created like this:
wnet <- data.frame(
"synset_offset" = c(02370954,02371120,02371337),
"ss_type" = c("VERB","VERB","VERB"),
"word" = c("fill", "depute", "substitute")
)
My issue happens when using the wordnet package to get the list of synonyms that I'd like to add as an additional column.
library(wordnet)
wnet$synonyms <- synonyms(wnet$word,wnet$ss_type)
I receive the following error.
Error in .jnew(paste("com.nexagis.jawbone.filter", type, sep = "."), word, :
java.lang.NoSuchMethodError: <init>
If I apply the function with defined values, it works.
> synonyms("fill","VERB")
[1] "fill" "fill up" "fulfil" "fulfill" "make full" "meet" "occupy" "replete" "sate" "satiate" "satisfy"
[12] "take"
Any suggestions to solve my issue are welcome.

I can't install the wordnet package on my computer for some reason, but it seems you're giving the synonyms function array arguments and you can't, you should be able to solve it with apply.
syn_list <- apply(wnet,by=1,function(row){synonyms(row["word"],row["ss_type"])})
it will return the output of the synonyms function for each row of the wnet data.frame
it's not clear what you want to do with:
wnet$synonyms <- synonyms(wnet$word,wnet$ss_type)
as for each row you will have an array of synonyms, that don't fit in the 3 rows of your data.frame.
maybe something like this will work for you:
wnet$synonyms <- sapply(syn_list,paste,collapse=", ")
EDIT - Here is a working solution to the problem above.
wnet$synset <- mapply(synonyms, as.character(wnet$word), as.character(wnet$ss_type))

Related

Write stata dataframe in R [duplicate]

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

R function doesn't recognize variable

I am not very familiar with loops in R, and am having a hard time stating a variable such that it is recognized by a function, DESeqDataSetFromMatrix.
pls is a table of integers. metaData is a data frame containing sample IDs and conditions corresponding to pls. I verified that the below steps run error-free with the individual elements of cond run successfully .
I reviewed relevant posts on referencing variables in R:
How to reference variable names in a for loop in R?
How to reference a variable in a for loop?
Based on these posts, I modified i in line 3 with single brackets, double brackets and "as.name". No luck. DESeqDataSetFromMatrix is reading the literal text after ~ and spits out an error.
cond=c("wt","dhx","mpp","taz")
for(i in cond){
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=~i, tidy = TRUE)
"sizeFactors"(dds) <- 1
paste0("PLS",i)<-DESeq(dds)
pdf <- paste(i,"-PLS_MA.pdf",sep="")
tsv <- paste(i,"-PLS.tsv",sep="")
pdf(file=pdf,paper = "a4r", width = 0, height = 0)
plotMA(paste0("PLS",i),ylim=c(-10,10))
dev.off()
write.table(results(paste0("PLS",i)),file = tsv,quote=FALSE, sep='\t', col.names = NA)
}
With brackets, an unexpected symbol error populates.
With i alone, DESEqDataSetFromMatrix tries to read "i" from my metaData column.
Is R just not capable of reading variables in some situations? Generally speaking, is it better to write loops outside of R in a more straightforward language, then push as standalone commands? Thanks for the help—I hope there is an easy fix.
For anyone else who may be having trouble looping with DESeq2 functions, comments above addressed my issue.
Correct input:
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=as.formula(paste0("~", i)), tidy = TRUE)
as.formula worked well with all DESeq functions that I tested.
reformulate(i) also worked well in most situations.
Thanks, everyone for the help!

How to use a variable to select a function within a package? [duplicate]

This question already has an answer here:
Get a function from a string of the form "package::function"
(1 answer)
Closed 3 years ago.
I've created a variable and want to use that variable to select my desired function in a package (e.g. package::function), however, the variable name is interpreted literally instead of being evaluated.
Here's the approach:
library(GSEABase)
library(tidyverse)
### SET ONTOLOGY GROUP (e.g. Biological Process = BP, Molecular Function = MF, Cellular Component = CC)
ontology <- "BP"
### Set GOOFFSPRING database, based on ontology group set above
go_offspring <- paste("GO", ontology, "OFFSPRING", sep = "")
## Need to know the 'offspring' of each term in the ontology, and this is given by the data in:
GO.db::go_offspring
## Create function to parse out GO terms assigned to each GOslim
## Courtesy Bioconductor Support: https://support.bioconductor.org/p/128407/
mappedIds <-
function(df, collection, OFFSPRING)
{
map <- as.list(OFFSPRING[rownames(df)])
mapped <- lapply(map, intersect, ids(collection))
df[["go_terms"]] <- vapply(unname(mapped), paste, collapse = ";", character(1L))
df
}
## Run the function
slimsdf <- mappedIds(slimsdf, myCollection, go_offspring)
This spits out the error:
Error: 'go_offspring' is not an exported object from 'namespace:GO.db'
When playing around in the R Studio console, I also notice that when I type
GO.db::
the autocomplete feature does not list my go_offspring variable as an option; it only lists the available functions within the GO.db package.
Seeing this behavior suggests there's a scope issue, in that the package cannot see variable definitions defined outside of the package.
Is there any way around this?
I've looked at this http://adv-r.had.co.nz/Environments.html, but I'm not entirely sure I follow all of it, nor do I see how to manipulate my environments to allow passing go_offspring to GO.db::.
You can use getFromNamespace to get the function via its character name from the namespace.
slimsdf <- mappedIds(slimsdf, myCollection, getFromNamespace(go_offspring, "GO.db"))

R: subset() function altered character data into strange code

i read some data into R with the read.xlsx() in openxlsx package, and here's my code for reading the data:
data_all = read.xlsx(xlsxFile = paste0(path, EoLfileName), sheet = 1, detectDates = T, skipEmptyRows = F)
now, when i access one name cell in my data, it will print the name in characters:
> data_all[1,'name']
[1] "76-ES+ADVIP-20G"
now, lets say i want to subset out some rows based on a condition on another colum:
data_sub = subset(data_all, !is.na(data_all$amount))
however, then if i print this subset data, i'd get:
> data_sub[1,'name']
[1] "A94198.10"
i've also tried to do subsetting using the following method:
data_sub = data_all[!is.na(data_all$amount),]
but i get the same thing: the expected output of "76-ES+ADVIP-20G" would be turned into "A94198.10"
I've checked many times with mode() and str() for data_all$name and data_sub$name, both return character, so they are in correct format.
here's a link to smaple data to play with:
https://drive.google.com/file/d/0BwIbultIWxeVY1VtdDU5NFp1Tkk/view?usp=sharing
Please please help me! I am quite stuck, and i dont see other posts with similar problem.
Why is this happeneing? subsetting shouldnt change data formatting correct?
Thank you in advance for your help!
additional note (if its helpful):
so when i tried to debug, i noticed that, when i was viewing the data_all in RStudio, and if i copy and paste the name "76-ES+ADVIP-20G" into the filter bar, it actually cannot find it; i'd have to type in "76-ES" and as soon as i type in the next character which is "+", RStudio data view filter would say "no matching records found"

cannot handle matrix/array columns with write.dbf

hope i get everything together for this problem. first time for me and it's a little bit tricky to describe.
I want to add some attributes to a dbf file and save it afterwards for use in qgis. its about elections and the data are the votes from the 11 parties in absolute and relative values. I use the shapefiles package for this, but also tried it simply with foreign.
my system: RStudio 0.97.311, R 2.15.2, shapefile 0.7, foreign 0.8-52, ubuntu 12.04
try #1 => no problems
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
try #2 => "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
shpDistricts <- read.shapefile(filename)
shpDataDistricts <- shpDistricts$dbf[[1]]
shpDataDistricts <- shpDataDistricts[, -c(3, 4, 5)] # delete some columns
shpDataDistricts <- cbind(shpDataDistricts, votesDistrict[, 2]) # add a new column
names(shpDataDistricts)[5] <- "SPOE"
shpDistricts$dbf[[1]] <- shpDataDistricts
write.shapefile(shpDistricts, filename))
the write function returns "error in get("write.dbf", "package:foreign")(dbf$dbf, out.name) : cannot handle matrix/array columns"
so by simply adding a column (integer) to the data.frame, the write.dbf function isn't able to write out anymore. am now debugging for 3 hours on this simple issue. tried it with shapefiles package via opening shapefile and dbf file, all the time the same problem.
When i use the foreign package directly (read.dbf).
if i save the dbf-file without the voting data (only with the small adapations from step 1+2), it's no problem. It must have to do with the merge with the voting data.
I got the same error message ("error in get("write.dbf"...) while working with shapefiles in R using rgdal. I added a column to the shapefile, then tried to save the output and got the error. I was added the column to the shapefile as a dataframe, when I converted it to a factor via as.factor() the error went away.
shapefile$column <- as.factor(additional.column)
writePolyShape(shapefile, filename)
The problem is that write.dbf cannot write a dataframe into an attribute table. So I try to changed it to character data.
My initial wrong code was:
d1<-data.frame(as.character(data1))
colnames(d1)<-c("county") #using rbind should give them same column name
d2<-data.frame(as.character(data2))
colnames(d2)<-c("county")
county<-rbind(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "PANY_animals_84.dbf") **##doesn't work**
##Error in write.dbf(dataname, ".bdf")cannot handle matrix/array columns
Then I changed everything to character, it works! right code is:
d1<-as.character(data1)
d2<-as.character(data2)
county<-c(d1,d2)
dbfdata$county <- county
write.dbf(dbfdata, "filename")
Hope it helps!

Resources