Access first line only when output has two lines in R - r

I am using a package in R called linkcomm and here's the documentation for it https://cran.r-project.org/web/packages/linkcomm/linkcomm.pdf
This is what I run so far
library(linkcomm)
g <- read.table("sample.txt", header = FALSE)
lc <- getLinkCommunities(g)
mc=meta.communities(lc, hcmethod = "ward.D2", deepSplit = FALSE)
cc <- getCommunityCentrality(x, type = "commconn")
tmp = head(sort(cc, decreasing = TRUE))
print(tmp)
Output: 1e+14 5712365 12815415 511042 12815383 512594
3388.230 1493.165 1375.577 1350.684 1312.197 1302.445
Now the question is, how do I access the first row only in tmp, which is the actual nodes in the network data?
When I do tmp[1], it produces
1e+14
3388.23 where I only need 1e+14.
dput(a)
structure(c(3388.22995373249, 1493.16521374732, 1375.57742835837,
1350.68389440675, 1312.19704460178, 1302.44518389222), .Names = c("1e+14",
"5712365", "12815415", "511042", "12815383", "512594"))

You have a named numeric vector as you can see below when using str.
str(a)
Named num [1:6] 3388 1493 1376 1351 1312 ...
- attr(*, "names")= chr [1:6] "1e+14" "5712365" "12815415" "511042" ...
#To select the 1st element
a[1]
1e+14
3388.23
#To select the 1st element value without name
unname(a[1])
3388.23
#To select the 1st element name
names(a[1])
[1] "1e+14"
For all names/values in the vector, you can use names(a) / unname(a).

Related

Import table from url with r - but numeric columns are characters

I have following code:
url <- "https://lebensmittel-naehrstoffe.de/calciumhaltige-lebensmittel/"
page <- read_html(url) #Creates an html document from URL
Ca <- html_table(page, fill = TRUE, dec = ",") #Parses tables into data frames
Ca <- data.frame(Ca)
But my last column of my data.frame Ca[,4] consists of values containing "." and "," - hence it is a german talbe the dec is",", but in R it is always a character. I tried already with gsub and as.numeric, but it always failed. Pleasse note: I already put dec=","
Could someone help me? If possible it should be a solution to run it on a lot of data.frames (or html imports or what ever) because I have many such tables...
Thank you very much!
You can use readr::parse_number :
Ca <- html_table(page, fill = TRUE, dec = ",")[[1]]
Ca$`Calciumgehalt in mg` <- readr::parse_number(Ca$`Calciumgehalt in mg`, locale = locale(decimal_mark = ",", grouping_mark = "."))
str(Ca)
# 'data.frame': 82 obs. of 4 variables:
# $ Lebensmittel : chr "Basilikum, getrocknet" "Majoran, getrocknet" "Thymian, getrocknet" "Selleriesamen" ...
# $ Kategorie : chr "Gewürze" "Gewürze" "Gewürze" "Gewürze" ...
# $ Mengenangabe : chr "je 100 Gramm" "je 100 Gramm" "je 100 Gramm" "je 100 Gramm" ...
# $ Calciumgehalt.in.mg: num 2240 1990 1890 1767 1597 ...

How to import multiple files into a list while keeping their names?

I am reading several SAS files from a server and load them all into a list into R. I removed one of the datasets because I didn't need it in the final analysis ( dateset # 31)
mylist<-list.files("path" , pattern = ".sas7bdat")
mylist <- mylist[- 31]
Then I used lapply to read all the datasets in the list ( mylist) at the same time
read.all <- lapply(mylist, read_sas)
the code works well. However when I run view(read.all) to see the the datasets, I can only see a number ( e.g, 1, 2, etc) instead of the names of the initial datasets.
Does anyone know how I can keep the name of datasets in the final list?
Also, can anyone tell me how I can work with this list in R?
is it an object ? may I read one of the dateset of the list ? or how can I join some of the datasets of the list?
Use basename and tools::file_path_sans_ext:
filenames <- head(list.files("~/StackOverflow", pattern = "^[^#].*\\.R", recursive = TRUE, full.names = TRUE))
filenames
# [1] "C:\\Users\\r2/StackOverflow/1000343/61469332.R" "C:\\Users\\r2/StackOverflow/10087004/61857346.R"
# [3] "C:\\Users\\r2/StackOverflow/10097832/60589834.R" "C:\\Users\\r2/StackOverflow/10214507/60837843.R"
# [5] "C:\\Users\\r2/StackOverflow/10215127/61720149.R" "C:\\Users\\r2/StackOverflow/10226369/60778116.R"
basename(filenames)
# [1] "61469332.R" "61857346.R" "60589834.R" "60837843.R" "61720149.R" "60778116.R"
tools::file_path_sans_ext(basename(filenames))
# [1] "61469332" "61857346" "60589834" "60837843" "61720149" "60778116"
somedat <- setNames(lapply(filenames, readLines, n=2),
tools::file_path_sans_ext(basename(filenames)))
names(somedat)
# [1] "61469332" "61857346" "60589834" "60837843" "61720149" "60778116"
str(somedat)
# List of 6
# $ 61469332: chr [1:2] "# https://stackoverflow.com/questions/61469332/determine-function-name-within-that-function/61469380" ""
# $ 61857346: chr [1:2] "# https://stackoverflow.com/questions/61857346/how-to-use-apply-family-instead-of-nested-for-loop-for-my-problem?noredirect=1" ""
# $ 60589834: chr [1:2] "# https://stackoverflow.com/questions/60589834/add-columns-to-data-frame-based-on-function-argument" ""
# $ 60837843: chr [1:2] "# https://stackoverflow.com/questions/60837843/how-to-remove-all-parentheses-from-a-vector-of-string-except-whe"| __truncated__ ""
# $ 61720149: chr [1:2] "# https://stackoverflow.com/questions/61720149/extracting-the-original-data-based-on-filtering-criteria" ""
# $ 60778116: chr [1:2] "# https://stackoverflow.com/questions/60778116/how-to-shift-data-by-a-factor-of-two-months-in-r" ""
Each "name" is the character representation of (in this case) the stackoverflow question number, with the ".R" removed. (And since I typically include the normal URL as the first line then an empty line in the files I use to test/play and answer SO questions, all of these files look similar at the top two lines.)

R package 'haven' read_spss: how to make it ignore value labels?

I have an SPSS file. I read it in using 'haven' package:
library(haven)
spss1 <- read_spss("SPSS_Example.sav")
I created a function that extracts the long labels (in SPSS - "Label"):
fix_labels <- function(x, TextIfMissing) {
val <- attr(x, "label")
if (is.null(val)) TextIfMissing else val
}
longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS")
Looks like a little bug in 'haven':
When I actually look at the attributes of one variable that has no
long label in SPSS but has Value Labels, I am getting:
attr(spss1$WAVE, "label")
NULL
But when I sapply my function longlabels to my data frame and ask it
to print the long labels for each column, for the same column "WAVE" I
am getting - instead of NULL:
NULL
VERY/SOMEWHAT FAMILIAR NOT AT ALL FAMILIAR
1 2
This is, of course, incorrect, because it grabs the next attribute
(which one?) and replaces NULL with it.
This function is supposed to create a vector of long labels and
usually it does, e.g.:
str(longlabels)
Named chr [1:64] "Serial number" ...
- attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ...
However, I just got an SPSS file with 92 columns and ran exactly the
same function on it. Now, I am getting not a vector, but a list
str(longlabels)
List of 92
$ VEHRATED : chr "VEHICLE RATED"
$ RESPID : chr "RESPONDENT ID"
$ RESPID8 : chr "8 DIGIT RESPONDENT NUMBER"
An observation about the structure of longlabels here: those columns
that do NOT have a long lable in SPSS but DO have Values (value
labels) - for them my function grabs their value labels, so that now
my long label is recorded as a numeric vector with names, e.g.:
$ AWARE2 : Named num [1:2] 1 2
..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR"
Question: How could I avoid the extraction of the Value Labels for the
columns that have no long labels?
Here is the solution. The problem was partial matching in attr():
fix_labels <- function(x, TextIfMissing) {
val <- attr(x, "label", exact = TRUE)
if (is.null(val)) TextIfMissing else val
}

Obtaining index names from a by object by parsing its call

I'm trying to create an as.data.frame.by method which basically melts the N-dimensional by object for use with latex.table.by.
Melting it is simple enough, since a by object is just a matrix, but then the variable names returned are the most un-descriptive "X"'s imaginable.
dat <- transform( ChickWeight, Time=cut(Time,3), Chick=cut(as.numeric(Chick),3) )
my.by <- by( dat, with(dat,list(Time,Chick,Diet)), function(x) sum(x$weight) )
Looking through attributes(my.by) doesn't reveal anywhere the index variable names are stored except the call. I'd like to default to something reasonably descriptive for the table.
So that leaves parsing the call:
> attr(my.by,"call")
by.data.frame(data = dat, INDICES = with(dat, list(Time, Chick,
Diet)), FUN = function(x) sum(x$weight))
> str(attr(my.by,"call"))
language by.data.frame(data = dat, INDICES = with(dat, list(Time, Chick, Diet)), FUN = function(x) sum(x$weight))
I just want the index names used, but I have no idea how to go about parsing this monster. Ideas?
If you make the call with named arguments you get dimnames as you expect:
> my.by <- with(dat, by( weight, list(Time=Time,Chick=Chick,Diet=Diet), sum ))
> str(my.by)
by [1:3, 1:3, 1:4] 3475 5969 8002 640 1596 ...
- attr(*, "dimnames")=List of 3
..$ Time : chr [1:3] "(-0.021,6.99]" "(6.99,14]" "(14,21]"
..$ Chick: chr [1:3] "(0.951,17.3]" "(17.3,33.7]" "(33.7,50]"
..$ Diet : chr [1:4] "1" "2" "3" "4"
- attr(*, "call")= language by.default(data = weight, INDICES = list(Time = Time, Chick = Chick, Diet = Diet), FUN = sum)
This will work for the example given:
as.character(tail(as.list(attr(my.by, 'call')[['INDICES']]), 1) [[1]]) [-1]
tail(..., 1)[[1]] grabs the list(Time,Chick,Diet), and [-1] drops list.
Hm, the wild guess of attr(my.by,"call")[["INDICES"]] seems to produce a language object.
And coercing that to character works surprisingly well:
> as.character(attr(my.by,"call")[["INDICES"]])
[1] "with" "dat" "list(Time, Chick, Diet)"
So I could probably grab it from there, although it will remain highly dependent on how the user specifies it. Better parsing ideas would be most appreciated.

Writing a Simple Triplet Matrix to a File?

I am using the tm package to compute term-document-matrix for a dataset, I now have to write the term-document-matrix to a file but when I use the write functions in R I am getting a error.
Here is the code which I am using and the error I am getting:
data("crude")
tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
dtm <- DocumentTermMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
and this is the error while I use the write.table command on this data:
Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat'
I understand that tbm is a object of type Simple Triplet Matrix, but how can I write this to a simple text file.
I think I might be misunderstanding the question, but if all you want to do is export the term document matrix to a file, then how about this:
m <- inspect(tdm)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.table(DF)
Is that what you're after mate?
Hope that helps a little,
Tony Breyal
Should the file be "human-readable"? If not, use dump, dput, or save. If so, convert your list into a data.frame.
Edit: You can convert your list into a matrix if each list element is equal length by doing matrix(unlist(list.name), nrow=length(list.name[[1]])) or something like that (or with plyr).
Why aren't you doing your SVM analysis in R (e.g. with kernlab)?
Edit 2: Ok, I looked at your data, and it isn't easy to convert into a matrix because the list elements aren't equal length:
> is.list(tdm)
[1] TRUE
> str(tdm)
List of 7
$ i : int [1:1475] 15 29 151 152 173 205 215 216 227 228 ...
$ j : int [1:1475] 1 1 1 1 1 1 1 1 1 1 ...
$ v : Named num [1:1475] 3.32 4.32 2.32 2 2.32 ...
..- attr(*, "names")= chr [1:1475] "1.50" "16.00" "barrel," "barrel." ...
$ nrow : int 985
$ ncol : int 20
$ dimnames :List of 2
..$ Terms: chr [1:985] "(bpd)" "(bpd)." "(gcc)" "(it) appears to be nearing a crossroads with regard to\nderegulation, both as it pertains to investments and imports," ...
..$ Docs : chr [1:20] "127" "144" "191" "194" ...
$ Weighting: chr [1:2] "term frequency - inverse document frequency" "tf-idf"
- attr(*, "class")= chr [1:2] "TermDocumentMatrix" "simple_triplet_matrix"
In order to convert this to a matrix, you will need to either take elements of this list (e.g. i, j) or else do some other manipulation.
Edit 3: Just to conclude my commentary here: these objects are intended to be used with the inspect function (see the package vignette).
As discussed, in order to use a function like write.table, you will need to convert your list into a matrix, which requires some manipulation of that list such that you have several vectors of equal length. Looking at the structure of these tm objects: this will be very difficult to do, and I suggest you work with the helper functions that are included with that package.
dtmMatrix <- as.matrix(dtm)
write.csv(dtmMatrix, 'mydata.csv')
This certainly does the work. However, when I tried it on a very large DTM (25000 by 35000), it gave errors relating to lack of memory space.
I used the following method:
dtm <- DocumentTermMatrix(corpus)
dtm1 <- removeSparseTerms(dtm,0.998) ##max allowed sparsity 0.998
m <- inspect(dtm1)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.csv(DF,"mydata0.998sparse.csv")
Which reduced the size of the document term matrix to a great extent!
Here you can increase the max allowable sparsity (closer to 1) to include more terms in DF.

Resources