Stacked table spreading and merging - r

I download SKOS Schema table from W3C to prepare a Vocabulary mapping mission. This is an example build in “dput”:
> dput(skosc)
structure(list(X1 = c("skos:Collection", "URI:", "Definition:",
"Label:", "Disjoint classes:", "skos:Concept", "URI:", "Definition:",
"Label:", "Disjoint classes:", "skos:ConceptScheme", "URI:",
"Definition:", "Label:", "Disjoint classes:", "skos:OrderedCollection",
"URI:", "Definition:", "Label:", "Super-classes:"), X2 = c("skos:Collection",
"http://www.w3.org/2004/02/skos/core#Collection", "Section 9. \r\n Concept Collections",
"Collection", "skos:Conceptskos:ConceptScheme", "skos:Concept",
"http://www.w3.org/2004/02/skos/core#Concept", "Section 3. The \r\n skos:Concept Class",
"Concept", "skos:Collectionskos:ConceptScheme", "skos:ConceptScheme",
"http://www.w3.org/2004/02/skos/core#ConceptScheme", "Section 4. \r\n Concept Schemes",
"Concept Scheme", "skos:Collectionskos:Concept", "skos:OrderedCollection",
"http://www.w3.org/2004/02/skos/core#OrderedCollection", "Section 9. \r\n Concept Collections",
"Ordered Collection", "skos:Collection")), .Names = c("X1", "X2"
), class = "data.frame", row.names = c(NA, -20L))
There is an odd in this stacked table besides the subtitle of each small table(such as “skos:Collection”, “skos:Concept” and so on) which we must notice: the rownames are not all the same either, like No. 20 Row in the example, it names “Super-classes:”, not “Disjoint classes:” as above small tables.
My plan is split this stacked table and transposition it as follows:
Before:
After:
“dplyr” and “tidyr” are both good at manipulating tables, and I choose “spread” function which can change table from long and narrow to short and wide. Unfortunately, it failed:
> skosns<-"http://www.w3.org/2009/08/skos-reference/skos.html"
> require(rvest)
载入需要的程辑包:rvest
载入需要的程辑包:xml2
> skospg<-read_html(skosns, encoding = "UTF-8", options = c("RECOVER", "NOERROR", "NSCLEAN"))
> skosnd<-html_nodes(skospg, "table")
> skosc<-html_table(skosnd[[1]], header = FALSE, trim = TRUE, fill = FALSE, dec = ".")
> skosp<-html_table(skosnd[[2]], header = FALSE, trim = TRUE, fill = FALSE, dec = ".")
> require(tidyr)
载入需要的程辑包:tidyr
> spread(skosc, key = X1, value = X2)
Error: Duplicate identifiers for rows (3, 8, 13, 18), (5, 10, 15), (4, 9, 14, 19), (2, 7, 12, 17)
The error massage didn’t tell me much about the reason, but I guess it maybe the odd row leads to this error. Can we ignore the differences among small tables and only spread the same value into different columns?
Question Updated:
The code post by scholar akrun in the commont is very helpful, I learned if there are 2 more values in one column, we need to group them and mutate the structure first. Then the data frame can be spread. Thanks to akrum!!!
Now comes the last process: delete there column of vocabulary name(such as "skos:Collection") and transfer them to the corresponding rows. But I have a weak point on write built-in function, so the program failed unsurprisingly:
> require(rvest)
> skospg<-read_html(skosns, encoding = "UTF-8", options = c("RECOVER", "NOERROR", "NSCLEAN"))
> skosnd<-html_nodes(skospg, "table")
> skosc<-html_table(skosnd[[1]], header = FALSE, trim = TRUE, fill = FALSE, dec = ".")
> require(dplyr)
> skosc_g<-group_by(skosc, X1)
> skosc_m<-mutate(skosc_g, n = row_number())
> require(tidyr)
载入需要的程辑包:tidyr
> skosc_t<-spread(skosc_m, key = X1, value = X2)
> vocn<-select_all(skosc_t, funs(colnames=grep("[[:alpha:]]+:[[:alpha:]]+")))
Error in grep("[[:alpha:]]+:[[:alpha:]]+") :
argument "x" is missing, with no default
> merge.data.frame(vocn, skosc_t, by=c("Collection", "Concept", "ConceptScheme"))
Error in as.data.frame(x) : object 'vocn' not found
The plan of this paragraph is to extract the columns which have a value as those vocabulary names{skosc_t[c(5,6,7,8),]}, then merge them with the dataframe in which these columns has been deleted{skosc_t[c(2,3,4,9,10),]}:
What to do is correct? Thanks a lot.

Related

is it possible to hide the footnote number reference in flextable footer section and replace with own text

I have a flextable:
df = data.frame(col1 = c(123,234,54,5), col2 = c(NA,1,2,3), col3 = 5:8)
df %>%
flextable() %>%
footnote(i = 1, j = 1,
value = as_paragraph('this is a footnote'),
ref_symbols = "1",
part = "body") %>%
footnote(i = 1, j = 1,
value = as_paragraph(as_b("Note (2):"),"This is another foonote"),
ref_symbols = "2",
part = "body")
that shows
What I would like to do is keep the flextable and footnotes, but remove the little 1 and 2 that appear in the footer section.
You can use add_footer_lines() to add any content in a new line and append_chunks() to append any content in existing cells/rows/columns (prepend_chunks() can sometimes be useful)
library(flextable)
df <- data.frame(col1 = c(123,234,54,5), col2 = c(NA,1,2,3), col3 = 5:8)
df |>
flextable() |>
add_footer_lines("Here is a foonote") |>
add_footer_lines(as_paragraph(as_b("Note (2):"),"This is another foonote")) |>
append_chunks(i = 1, j = 1, as_sup("12")) |>
append_chunks(i = 2, j = 1, as_sup("‡"))
The David Gohel's previous answer is fine for not getting this behavior of adding the symbol in the cell and in the footnote, but this previous answer will not add "1" or "2" in the cells of the table. In this case, you have to add these symbols by yourself in the cells, since the footnote func' require a "value" parameter (or you get Error in mapply(function(x, y) { : argument "value" is missing, with no default).
In order to add ¹, ², etc. in the cells of the table, I suggest to use a superscript generator (e.g., here), and a paste0 func' to modify the cells and add the symbol ¹ to the data; then use the add_footer_lines func' to create the footnote that you want, without the symbol at the beginning of the footnote (see David Gohel's previous answer).
PS: for a good readability of the "ref_symbols", I prefer to use letters when the symbol appears next to a number (e.g., I use "† ", "‡ ", "A" or "B") and I add number-footnote-symbol when it come next to a text (e.g., ¹ or ²).

How can transform JSON to DF?

I m working with a Rstudio code, i have 450 JSON files, i have all in my workspace, with some JSON files are all rigth, but with some files like this one (https://drive.google.com/file/d/1DsezCmN8_8iLNCAsLZiRnxTrwnWu6LkD/view?usp=sharing , is a 296kb json) when i try to make the field tm to dataframe i have this mistake
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 1
The code that i use is
JSONList <- rjson::fromJSON(file = "2.json", simplify = F)
DF <- as.data.frame(JSONList$tm)
With the files that are ok i obtain a 1 observation of 5168 variables.
How can i avoid this priblem with some files?
Thanks
Another posibility that i think is select the rows that i need
candidatos = list(
"name",
"score",
"tot_sFieldGoalsMade",
"tot_sFieldGoalsAttempted",
"tot_sTwoPointersMade",
"tot_sTwoPointersAttempted",
"tot_sThreePointersMade",
"tot_sThreePointersAttempted",
"tot_sFreeThrowsMade",
"tot_sFreeThrowsAttempted",
"tot_sReboundsDefensive",
"tot_sReboundsOffensive",
"tot_sReboundsTotal",
"tot_sAssists",
"tot_sBlocks",
"tot_sTurnovers",
"tot_sFoulsPersonal",
"tot_sPointsInThePaint",
"tot_sPointsSecondChance",
"tot_sPointsFromTurnovers",
"tot_sBenchPoints",
"tot_sPointsFastBreak",
"tot_sSteals"
)
ListColum<-map(candidatos, function(x){
as.data.frame(data$tm$"2"$x)
} )
But R give me a list of 23 DF with no elements

Reading Spss Data file in R

i am using Expss pakage .
df<-read_spss("test.SAV")
I shows the following:
Warning message: In foreign::read.spss(enc2native(file),
use.value.labels = FALSE, : Tally.SAV: Very long string record(s)
found (record type 7, subtype 14), each will be imported in
consecutive separate variables
It shows 4174 Variables in environment Panel.Actual Number of Variables in the Data file around 400.
Can anyone among you please help me on this.
As mentioned in the comment foreign::read.spss split SPSS long (>255 chars) characters variables into the several columns. If the such columns are empty you can drop them without any issues.
Convenience function for this:
remove_empty_characters_after_foreign = function(data){
empty_chars = vapply(data, FUN = function(column) is.character(column) & all(is.na(column)), FUN.VALUE = logical(1))
additional_chars = grepl("00\\d$", colnames(data), perl = TRUE)
to_remove = empty_chars & additional_chars
if(any(to_remove)){
message(paste0("Removing ", paste(colnames(data)[to_remove], collapse = ", "),"..."))
}
data[,!to_remove, drop = FALSE]
}
df = remove_empty_characters_after_foreign(df)

Convert column values to percentages

I am reading in multiple Excel files into lists using read.xlsx from the openxlsx package. I append the lists with rbind and perform some data manipulation.
What I need to do is convert the values in columns 18 and 19 to percentages (currently the values show as .90, .85, etc. but I can also force the user to enter as 90, 85, etc. I need to 90%, 85%). I have tried to do this inside the data.frame and also using createStyle. So far, nothing has worked and will either corrupt my data or simply do nothing.
Here is what I have tried...
openxlsx Style
# Create percent style
pct = createStyle(numFmt = "0%")
# Apply style
addStyle(wb, sheet = "filename", style = pct, cols = 18, rows = 102, gridExpand = TRUE)
str_replace
allData <- str_replace(allData$'Content', pattern = "%", "")
allData$'Content' <- as.numeric(allData)/100
sapply (even just trying to convert data type to numeric didn't work. It was still set to General
allData[, c(18)] <- sapply(allData[, c(18)], as.numeric)
Any help would be greatly appreciated!
Figured this out sometime ago but forgot to post the answer. For those who are interested...
# Create a percent style
pct = createStyle(numFmt = "0%")
# Add percent style
addStyle(wb, sheet = "my_filename", style = pct, cols = c(18, 19), rows = 2:(nrow(allData)+1), gridExpand = TRUE)

Benford - Dataset with NA strings returns an error in extract.digits

I've a dataset of macroeconomic data like GDP, inflation, etc... where Rows=different macroeconomic indicators and columns=years
Since some values are missing (ex: the GDP of any country in any year), they are charged as "NA".
When I perform these operations:
#
data = read.table("14varnumeros.txt", header = FALSE, sep = "", na.strings = "NA", dec = ".", strip.white = TRUE)
benford(data, number.of.digits = 1, sign = "both", discrete=TRUE, round=3)
#
It gives me this error:
Error in extract.digits(data, number.of.digits, sign, second.order, discrete = discrete, :
Data must be a numeric vector
I assume that this is because of the NA strings, but I do not know how to solve it.
I came across this issue, too. In my case, it wasn't missing data, instead it's because of a quirk in the extract.digits() function of the benford.analysis package. The function is checking if the data supplied to it is numeric data, but it does so using class(dat) != "numeric" instead of using the is.numeric() function.
This produces unexpected errors. Consider the code below:
library(benford.analysis)
dat <- data.frame(v1 = 1:5, v2 = c(1, 2, 3, 4, 5))
benford(dat$v1) # produces error
I've submitted an issue on Github, but you can simply wrap your data in as.numeric() and you should be fine.

Resources