I think I am missing rather simple here, but what is the syntax for adding arguments to the write.graph function in R's igraph package? I am trying to output a network to a pajek formatted file (.net) with weighted edges and IDs. I've tried the following commands, but keep getting errors ("Unknown arguments to write.graph (Pajek format)."):
write.graph(weightedg,file="musGiant2012.net", format="pajek",'weight')
write.graph(weightedg,file="musGiant2012.net", format="pajek", id=TRUE)
write.graph(weightedg,file="musGiant2012.net", format="pajek", ("id"))
Plus many others. I am pretty sure that I am committing a simple syntax error, but cannot find any guidance on how to correct it.
From the docs at http://igraph.org/r/doc/write.graph.html:
The Pajek format is a text file, see read.graph for details. Appropriate vertex and edge attributes are also written to the file. This format has no additional arguments.
And http://igraph.org/r/doc/read.graph.html shows that edge weights are supported, and vertex ids are supported as well. So if you have your vertex ids as an attribute called id, and your edge weights as an attribute called weight, then you do not need any extra argument. E.g.
library(igraph)
g <- graph.ring(5)
V(g)$id <- letters[1:5]
E(g)$weight <- runif(ecount(g))
tmp <- tempfile()
write.graph(g, file = tmp, format = "pajek")
cat(readLines(tmp), sep = "\n")
#> *Vertices 5
#> 1 "a"
#> 2 "b"
#> 3 "c"
#> 4 "d"
#> 5 "e"
#> *Edges
#> 1 2 0.054399197222665
#> 2 3 0.503386947326362
#> 3 4 0.373047293629497
#> 4 5 0.84542120853439
#> 1 5 0.610330935101956
Related
I am trying to create dynamic dataframe names within a for loop. I am using the paste function in R to write the dataframe names. see the example below:
for (i in 1:3){
paste("Data",i,sep="") <- data.frame(colone=c(1,2,3,4),coltwo=c(5,6,7,8))
paste("New data",i,sep="") <- paste("Data",i,sep="") %>% mutate(colthree=(colone+coltwo)*i) %>% select(colthree)
}
The code above won't work as R doesn't understand paste as a dataframe name. I have found some solutions using the assign function which could help with my 1st line of code using: assign(paste("Data",i,sep=""),data.frame(colone=c(1,2,3,4),coltwo=c(5,6,7,8))) but I don't know what to do with the 2nd line where the paste function is used twice to refer to multiple dataframes. Not sure using a nested assign function works and even if it does the code will look terrible with more complex code.
I know there might be ideas of how to combine the 2 lines above into a single assign statement or other similar solutions but is there any way to refer to 2 dynamic dataframe names within a single line of code as per my example above?
Many thanks :)
If you need both data frames ("Data i" and "New Data i") you can use:
for (i in 1:3){
assign(paste("New data",i,sep=""), data.frame(assign(paste("Data",i,sep=""),data.frame(colone=c(1,2,3,4),coltwo=c(5,6,7,8))) %>% mutate(colthree=(colone+coltwo)*i) %>% select(colthree)))
}
If you only want "New Data i" use:
for (i in 1:3){
assign(paste("New data",i,sep=""), data.frame(colone=c(1,2,3,4),coltwo=c(5,6,7,8))) %>% mutate(colthree=(colone+coltwo)*i) %>% select(colthree)
}
This seems to be working but it's a little bit convoluted:
library(tidyverse)
library(stringr)
library(rlang)
#>
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#>
#> %#%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#> flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#> splice
i <- seq(1, 3, 1) #how many loops
pwalk(list(paste("Data",i,sep=""), paste("New_data",i,sep=""), i), ~{
assign(..1, data.frame(colone=c(1,2,3,4),coltwo=c(5,6,7,8)), envir = .GlobalEnv)
new <- sym(..1) #convert a string to a variable name
assign(..2, {eval_tidy(new)%>% mutate(colthree=(colone+coltwo)*..3) %>% select(colthree)}, envir = .GlobalEnv)
})
names(.GlobalEnv)
#> [1] "Data1" "Data2" "Data3" "i" "New_data1" "New_data2"
#> [7] "New_data3"
Data1
#> colone coltwo
#> 1 1 5
#> 2 2 6
#> 3 3 7
#> 4 4 8
New_data1
#> colthree
#> 1 6
#> 2 8
#> 3 10
#> 4 12
Created on 2021-06-10 by the reprex package (v2.0.0)
I have a list and the field inside each list element is of same name(only values are different) and I need to convert that into a data.frame with column name is same as that of field name. Following is my list,
Data input (data input in json format.json)
library(rjson)
data <- fromJSON(file = "data input in json format.json")
head(data,3)
[[1]]
[[1]]$floors
[1] 5
[[1]]$elevation
[1] 15
[[1]]$bmi
[1] 23.7483
[[2]]
[[2]]$floors
[1] 4
[[2]]$elevation
[1] 12
[[2]]$bmi
[1] 23.764
[[3]]
[[3]]$floors
[1] 3
[[3]]$elevation
[1] 9
[[3]]$bmi
[1] 23.7797
And my expected data.frame is,
floors elevation bmi
5 15 23.7483
4 12 23.7640
3 9 23.7797
Can you help me to figure out this ?.
Thanks in adavance.
You can use jsonlite.
library(jsonlite)
Then use fromJSON() and specify the path to your file (or alternatively a URL or the raw text) in the argument txt:
fromJSON(txt = 'path/to/json/file.json')
The result is:
floors elevation bmi
1 5 15 23.7483
2 4 12 23.7640
3 3 9 23.7797
If you prefer rjson, you could first read it as previously:
data <- rjson::fromJSON(file = 'path/to/json/file.json')
Then use do.call() and rbind.data.frame() to convert the list to a dataframe:
do.call("rbind.data.frame", data)
Alternatively to do.call(): use data.tables rbindlist() which is faster:
data.table::rbindlist(data)
This question is a possible duplicate of Lemmatizer in R or python (am, are, is -> be?), but I'm adding it again since the previous one was closed saying it was too broad and the only answer it has is not efficient (as it accesses an external website for this, which is too slow as I have very large corpus to find the lemmas for). So a part of this question will be similar to the above mentioned question.
According to Wikipedia, lemmatization is defined as:
Lemmatisation (or lemmatization) in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item.
A simple Google search for lemmatization in R will only point to the package wordnet of R. When I tried this package expecting that a character vector c("run", "ran", "running") input to the lemmatization function would result in c("run", "run", "run"), I saw that this package only provides functionality similar to grepl function through various filter names and a dictionary.
An example code from wordnet package, which gives maximum of 5 words starting with "car", as the filter name explains itself:
filter <- getTermFilter("StartsWithFilter", "car", TRUE)
terms <- getIndexTerms("NOUN", 5, filter)
sapply(terms, getLemma)
The above is NOT the lemmatization that I'm looking for. What I'm looking for is, using R I want to find true roots of the words: (For e.g. from c("run", "ran", "running") to c("run", "run", "run")).
Hello you can try package koRpus which allow to use Treetagger :
tagged.results <- treetag(c("run", "ran", "running"), treetagger="manual", format="obj",
TT.tknz=FALSE , lang="en",
TT.options=list(path="./TreeTagger", preset="en"))
tagged.results#TT.res
## token tag lemma lttr wclass desc stop stem
## 1 run NN run 3 noun Noun, singular or mass NA NA
## 2 ran VVD run 3 verb Verb, past tense NA NA
## 3 running VVG run 7 verb Verb, gerund or present participle NA NA
See the lemma column for the result you're asking for.
As a previous post mentioned, the function lemmatize_words() from the R package textstem can perform this and give you what I understand as your desired results:
library(textstem)
vector <- c("run", "ran", "running")
lemmatize_words(vector)
## [1] "run" "run" "run"
#Andy and #Arunkumar are correct when they say textstem library can be used to perform stemming and/or lemmatization. However, lemmatize_words() will only work on a vector of words. But in a corpus, we do not have vector of words; we have strings, with each string being a document's content. Hence, to perform lemmatization on a corpus, you can use function lemmatize_strings() as an argument to tm_map() of tm package.
> corpus[[1]]
[1] " earnest roughshod document serves workable primer regions recent history make
terrific th-grade learning tool samuel beckett applied iranian voting process bard
black comedy willie loved another trumpet blast may new mexican cinema -bornin "
> corpus <- tm_map(corpus, lemmatize_strings)
> corpus[[1]]
[1] "earnest roughshod document serve workable primer region recent history make
terrific th - grade learn tool samuel beckett apply iranian vote process bard black
comedy willie love another trumpet blast may new mexican cinema - bornin"
Do not forget to run the following line of code after you have done lemmatization:
> corpus <- tm_map(corpus, PlainTextDocument)
This is because in order to create a document-term matrix, you need to have 'PlainTextDocument' type object, which gets changed after you use lemmatize_strings() (to be more specific, the corpus object does not contain content and meta-data of each document anymore - it is now just a structure containing documents' content; this is not the type of object that DocumentTermMatrix() takes as an argument).
Hope this helps!
Maybe stemming is enough for you? Typical natural language processing tasks make do with stemmed texts. You can find several packages from CRAN Task View of NLP: http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
If you really do require something more complex, then there's specialized solutsions based on mapping sentences to neural nets. As far as I know, these require massive amount of training data. There is lots of open software created and made available by Stanford NLP Group.
If you really want to dig into the topic, then you can dig through the event archives linked at the same Stanford NLP Group publications section. There's some books on the topic as well.
I think the answers are a bit outdated here. You should be using R package udpipe now - available at https://CRAN.R-project.org/package=udpipe - see https://github.com/bnosac/udpipe or docs at https://bnosac.github.io/udpipe/en
Notice the difference between the word meeting (NOUN) and the word meet (VERB) in the following example when doing lemmatisation and when doing stemming, and the annoying screwing up of the word 'someone' to 'someon' when doing stemming.
library(udpipe)
x <- c(doc_a = "In our last meeting, someone said that we are meeting again tomorrow",
doc_b = "It's better to be good at being the best")
anno <- udpipe(x, "english")
anno[, c("doc_id", "sentence_id", "token", "lemma", "upos")]
#> doc_id sentence_id token lemma upos
#> 1 doc_a 1 In in ADP
#> 2 doc_a 1 our we PRON
#> 3 doc_a 1 last last ADJ
#> 4 doc_a 1 meeting meeting NOUN
#> 5 doc_a 1 , , PUNCT
#> 6 doc_a 1 someone someone PRON
#> 7 doc_a 1 said say VERB
#> 8 doc_a 1 that that SCONJ
#> 9 doc_a 1 we we PRON
#> 10 doc_a 1 are be AUX
#> 11 doc_a 1 meeting meet VERB
#> 12 doc_a 1 again again ADV
#> 13 doc_a 1 tomorrow tomorrow NOUN
#> 14 doc_b 1 It it PRON
#> 15 doc_b 1 's be AUX
#> 16 doc_b 1 better better ADJ
#> 17 doc_b 1 to to PART
#> 18 doc_b 1 be be AUX
#> 19 doc_b 1 good good ADJ
#> 20 doc_b 1 at at SCONJ
#> 21 doc_b 1 being be AUX
#> 22 doc_b 1 the the DET
#> 23 doc_b 1 best best ADJ
lemmatisation <- paste.data.frame(anno, term = "lemma",
group = c("doc_id", "sentence_id"))
lemmatisation
#> doc_id sentence_id
#> 1 doc_a 1
#> 2 doc_b 1
#> lemma
#> 1 in we last meeting , someone say that we be meet again tomorrow
#> 2 it be better to be good at be the best
library(SnowballC)
tokens <- strsplit(x, split = "[[:space:][:punct:]]+")
stemming <- lapply(tokens, FUN = function(x) wordStem(x, language = "en"))
stemming
#> $doc_a
#> [1] "In" "our" "last" "meet" "someon" "said"
#> [7] "that" "we" "are" "meet" "again" "tomorrow"
#>
#> $doc_b
#> [1] "It" "s" "better" "to" "be" "good" "at" "be"
#> [9] "the" "best"
Lemmatization can be done in R easily with textStem package.
Steps are:
1) Install textstem
2) Load the package by
library(textstem)
3) stem_word=lemmatize_words(word, dictionary = lexicon::hash_lemmas)
where stem_word is the result of lemmatization and word is the input word.
How to read the following vector "c" of strings into a list of tables? Which way is the shortest read.table strsplit? e.g. I cant see how to read the table Edit:c[4:6] a[4:6] in one command.
require(car)
m<-matrix(rnorm(16),4,4,byrow=T)
a<-Anova(lm(m~1),type=3,idata=data.frame(treatment=factor(1:4)),idesign=~treatment)
c<-capture.output(summary(a,multivariate=F))
c
This returns lines 4:6
c[4:6]
Now if you wanted to parse this I would do it in two steps. First on the column values from rows 5:6 and then add back the names.
> vals <- read.table(text=c[5:6])
> txt <- " \t SS\t num Df\t Error SS\t den Df\t F\t Pr(>F)"
> names(vals) <- names(read.delim(text=txt))
> vals
X SS num.Df Error.SS den.Df F Pr..F.
1 (Intercept) 0.57613392 1 0.4219563 3 4.09616 0.13614
2 treatment 1.85936442 3 8.2899759 9 0.67287 0.58996
EDIT --
you could look at the source code of the summary function and calculate the quantities required by yourself
getAnywhere(summary.Anova.mlm)
The original idea seems not to work.
c2 <- summary(a)
# find out what 'properties' the summary object has
# turns out, it is just the Anova object
class(c2) <- "list"
names(c2)
This returns
[1] "SSP" "SSPE" "P" "df" "error.df"
[6] "terms" "repeated" "type" "test" "idata"
[11] "idesign" "icontrasts" "imatrix" "singular"
and we can get access them
c2$SSP
c2$SSPE
It seems not a good idea to use R internal c function as a variable name
>titletool<-read.csv("TotalCSVData.csv",header=FALSE,sep=",")
> class(titletool)
[1] "data.frame"
>titletool[1,1]
[1] Experiment name : CONTROL DB AD_1
>t<-titletool[1,1]
>t
[1] Experiment name : CONTROL DB AD_1
>class(t)
[1] "character"
now i want to create an object (vector) with the name "Experiment name : CONTROL DB AD_1" , or even better if possible CONTROL DB AD_1
Thank you
Use assign:
varname <- "Experiment name : CONTROL DB AD_1"
assign(varname, 3.14158)
get("Experiment name : CONTROL DB AD_1")
[1] 3.14158
And you can use a regular expression and sub or gsub to remove some text from a string:
cleanVarname <- sub("Experiment name : ", "", varname)
assign(cleanVarname, 42)
get("CONTROL DB AD_1")
[1] 42
But let me warn you this is an unusual thing to do.
Here be dragons.
If I understand correctly, you have a bunch of CSV files, each with multiple experiments in them, named in the pattern "Experiment ...". You now want to read each of these "experiments" into R in an efficient way.
Here's a not-so-pretty (but not-so-ugly either) function that might get you started in the right direction.
What the function basically does is read in the CSV, identify the line numbers where each new experiment starts, grabs the names of the experiments, then does a loop to fill in a list with the separate data frames. It doesn't really bother making "R-friendly" names though, and I've decided to leave the output in a list, because as Andrie pointed out, "R has great tools for working with lists."
read.funkyfile = function(funkyfile, expression, ...) {
temp = readLines(funkyfile)
temp.loc = grep(expression, temp)
temp.loc = c(temp.loc, length(temp)+1)
temp.nam = gsub("[[:punct:]]", "",
grep(expression, temp, value=TRUE))
temp.out = vector("list")
for (i in 1:length(temp.nam)) {
temp.out[[i]] = read.csv(textConnection(
temp[seq(from = temp.loc[i]+1,
to = temp.loc[i+1]-1)]),
...)
names(temp.out)[i] = temp.nam[i]
}
temp.out
}
Here is an example CSV file. Copy and paste it into a text editor and save it as "funkyfile1.csv" in the current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv)
"Experiment Name: Here Be",,
1,2,3
4,5,6
7,8,9
"Experiment Name: The Dragons",,
10,11,12
13,14,15
16,17,18
Here is a second CSV. Again, copy-paste and save it as "funkyfile2.csv" in your current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv)
"Promises: I vow to",,
"H1","H2","H3"
19,20,21
22,23,24
25,26,27
"Promises: Slay the dragon",,
"H1","H2","H3"
28,29,30
31,32,33
34,35,36
Notice that funkyfile1 has no column names, while funkyfile2 does. That's what the ... argument in the function is for: to specify header=TRUE or header=FALSE. Also the "expression" identifying each new set of data is "Promises" in funkyfile2.
Now, use the function:
read.funkyfile("funkyfile1.csv", "Experiment", header=FALSE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
# "Experiment", header=FALSE) # Uncomment to load remotely
# $`Experiment Name Here Be`
# V1 V2 V3
# 1 1 2 3
# 2 4 5 6
# 3 7 8 9
#
# $`Experiment Name The Dragons`
# V1 V2 V3
# 1 10 11 12
# 2 13 14 15
# 3 16 17 18
read.funkyfile("funkyfile2.csv", "Promises", header=TRUE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv",
# "Experiment", header=TRUE) # Uncomment to load remotely
# $`Promises I vow to`
# H1 H2 H3
# 1 19 20 21
# 2 22 23 24
# 3 25 26 27
#
# $`Promises Slay the dragon`
# H1 H2 H3
# 1 28 29 30
# 2 31 32 33
# 3 34 35 36
Go get those dragons.
Update
If your data are all in the same format, you can use the lapply solution mentioned by Andrie along with this function. Just make a list of the CSVs that you want to load, as below. Note that the files all need to use the same "expression" and other arguments the way the function is currently written....
temp = list("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
"http://dl.dropbox.com/u/2556524/testing/funkyfile3.csv")
lapply(temp, read.funkyfile, "Experiment", header=FALSE)