How to create a dataframe with various length of rows in R? - r

I am having a list of list paths which shown below ?
The Code is :
for (each in paths)
{
print (each)
}
The output is :
[1] "1" "2"
[1] "1" "2" "3"
[1] "1" "2" "3" "5"
[1] "1" "2" "4"
[1] "1" "2" "4" "5"
[1] "1" "3"
[1] "1" "3" "5"
[1] "1" "4"
[1] "1" "4" "5"
[1] "1" "5"
[1] "2" "3"
[1] "2" "3" "5"
[1] "2" "4"
[1] "2" "4" "5"
[1] "3" "5"
[1] "4" "5"
How to append this all as a rows of a data frame. as.data.frame fails due to unequal rows length.

A data frame is rectangular by definition, with the same number of columns in each row. You could set the length of each of your rows to be the same (they will be filled in with NA), and then rbind them together:
maxlength = max(lengths(paths))
paths2 = lapply(paths, function(x) {length(x) = maxlength; return(x)})
paths_df = do.call(rbind, args = paths2)
That will give a matrix, but you can easily convert to data frame from there.

data.frame needs to be rectangular. Also all elements of a given column need to be the same type of object. Thus, you could have a data.frame column composed of object of type list which can vary in size.
paths=list(1,c(1,2))
df=data.frame("pathNumber"= 1:length(paths))
df$path=paths
The result looks like this
pathNumber path
1 1 1
2 2 1, 2

One option is to have the list as a column of a data frame. This may be desirable if you want to have some other columns.
df <- data.frame(paths = I(paths))

Related

How can I create a new txt file by matching two different txt file and finding the same values in R?

I have 2 text files: File A and File B.
I will match the first column of File A and the first row of File B.
If the values of the first column in File A is in the first row of File B, I want to get those values along their all column values and the first row values that correspond to them.
File A:
"...1" "AZD5153" "I-BET-762" "I-BRD9" "JQ1" "OTX-015" "PFI-1" "RVX-208"
"1" "697" 0.155445 1.328728 7.6345 7.553337 0.496983 1.776878 24.540592
"2" "5637" 11.767517 66.561037 314.672133 3.891947 17.54448 10.27559 261.520227
"3" "22RV1" 2.144765 9.04165 193.4228 4.448654 19.315063 9.55938 72.036416
"4" "23132-87" 1.882177 41.26784 33.482054 10.959235 9.025218 19.621473 75.332425
"5" "42-MG-BA" 2.252297 26.56874 54.934795 7.92924 10.276993 7.937254 64.873664
"6" "639-V" 6.412568 16.979172 30.882936 12.444024 21.915518 6.449247 96.50391
File B:
"...1" "1321N1" "143B" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61161 62052 61249 66154 54236
"2" "10000_at" 81556 66152 45676 43519 66723
"3" "10001_at" 97864 99699 8872 91376 10029
"4" "10002_at" 37977 40304 38455 37085 36431
"5" "10003_at" 35458 38504 40458 39508 41589
"6" "100048912_at" 40034 37959 41465 39271 39157
"7" "100049716_at" 42744 46775 52087 47239 42522
Expected File:
"...1" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61249 66154 54236
"2" "10000_at" 45676 43519 66723
"3" "10001_at" 8872 91376 10029
"4" "10002_at" 38455 37085 36431
"5" "10003_at" 40458 39508 41589
"6" "100048912_at" 41465 39271 39157
"7" "100049716_at" 52087 47239 42522
First of all, ensure you have the correct paths to FILEA.txt and FILEB.txt, as well as the desired path to FILEC.txt. In my case, I did:
path_to_file_A <- path.expand("~/FILEA.txt")
path_to_file_B <- path.expand("~/FILEB.txt")
path_to_file_C <- path.expand("~/FILEC.txt")
Now the following code should work:
A <- read.table(path_to_file_A, header = TRUE, check.names = FALSE)
B <- read.table(path_to_file_B, header = TRUE, check.names = FALSE)
result <- cbind(B[1], B[na.omit(match(A[[1]], names(B)))])
write.table(result, path_to_file_C)
Which results in:
FILEC.txt
"...1" "22RV1" "23132-87" "42-MG-BA"
"1" "100009676_at" 61249 66154 54236
"2" "10000_at" 45676 43519 66723
"3" "10001_at" 8872 91376 10029
"4" "10002_at" 38455 37085 36431
"5" "10003_at" 40458 39508 41589
"6" "100048912_at" 41465 39271 39157
"7" "100049716_at" 52087 47239 42522

Separating A String Into Characters

I have some ordered test results encoded in a character string. The string can be of arbitrary length. Each digit in the string represents a test result. In the following, for example, there are four test results represented:
2069
I want to tidy these up in R by splitting the string into individual observations. No problem with strsplit or string::str_split, which returns four values that will become my observations.
strsplit("2069" %>% as.character(), split = "") %>% unlist()
[1] "2" "0" "6" "9"
Now, however, I have realized that some results are values greater than 9. These two-digit values have been encoded with parentheses to make clear they are not individual results.
For example, in the following case I still have four values, but some have been enclosed in parentheses to group the values larger than 9.
2(10)1(12)
I'm struggling with a way to break these up so that I get
[1] "2" "10" "1" "12"
Appreciate any guidance. Thanks.
Updated - pattern match based on the OP's new pattern showed in the comments. Here, we use str_extract to extract one or more digits that follow an open parentheses (regex lookaround ) or (|) any character that is not a parentheses ([^()])
library(stringr)
str_extract_all(str1, "(?<=[(])\\d+|[^()]")
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
-testing on the OP's extra pattern
str_extract_all(str2, "(?<=[(])\\d+|[^()]")
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
[[5]]
[1] "10" "0" "2" "0" "1"
-Earlier solutions (Based on the assumption that all the numbers that are greater than 9 will be wrapped inside the parentheses)
We may split on the parentheses in base R
unlist(strsplit(str1[1], "\\(|\\)"))
[1] "2" "10" "1" "12"
Assuming if there are both cases, then an option is to get the index of those elements have the parentheses and do this separately
i1 <- grepl("\\(|\\)", str1)
lst1 <- vector('list', length(str1))
lst1[i1] <- strsplit(str1[i1], "\\(|\\)")
lst1[!i1] <- strsplit(str1[!i1], "")
unlist(lst1)
[1] "2" "10" "1" "12" "2" "0" "6" "9" "2" "15" "2" "1" "3" "1"
or another option is ifelse with grepl to create a single delimiter and then use strsplit
lst1 <- strsplit(trimws(ifelse(grepl("\\(|\\)", str1),
gsub("\\(|\\)", ",", str1), gsub("(?<=.)(?=.)", "\\1,\\2",
str1, perl = TRUE)), whitespace = ","), ",")
lst1
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
data
str1 <- c("2(10)1(12)", "2069", "2(15)", "2131")
str2 <- c(str1, "(10)0201")
Maybe we can do like below (borrow str1 from #akrun)
> mapply(strsplit, str1, ifelse(grepl("[()]", str1), "\\(|\\)", ""))
$`2(10)1(12)`
[1] "2" "10" "1" "12"
$`2069`
[1] "2" "0" "6" "9"
$`2(15)`
[1] "2" "15"
$`2131`
[1] "2" "1" "3" "1"
Use
(?<=\()\d+(?=\))|\d
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d digits (0-9)
R code:
library(stringr)
str1 <- c("2(10)1(12)", "2069", "2(15)", "2131")
str_extract_all(str1, "(?<=\\()\\d+(?=\\))|\\d")
Results:
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"

Favstats (R-Mosaic) returns an error due to non numeric variables although variables are only numeric

I am trying to get Favstats to work.
I am using a "normal" Dataset with numeric variables that I have loaded in with:
ALLBUS2018 <- read.csv("~/Desktop/ALLBUS2018.csv", sep="")
When I use Favstats on one of the variables the following happens:
fav_stats(~ ep01, data = ALLBUS2018, na.rm = TRUE)
Fehler in fav_stats(~ep01, data = ALLBUS2018, na.rm = TRUE) :
Objekt 'pairlist' kann nicht nach 'double' umgewandelt werden
Zusätzlich: Warnmeldung:
In fav_stats(~ep01, data = ALLBUS2018, na.rm = TRUE) :
Auto-converting formula to numeric.
I have re-installed the Dataset and deleted R completly.
A friend of mine gets a correct output with the same input and no other data in R.
I have tried as.numeric and sapply(ALLBUS2018, function(txt) eval(parse(text=txt)))
Here you see another Error Message I got
Here you can find the data used:
https://www.dropbox.com/s/fa9hplvk2j6q1cl/ALLBUS2018.csv?dl=0
Thanks for your help!
HS
You're making two mistakes
Your file is not a .csv - it's plaintext delimited by spaces, rather than commas. For this reason, read.csv is returning a column vector of massive strings.
Your syntax is wrong inside mosaic::fav_stats- you should be doing ALLBUS2018$ep01, rather than ~ep01, data = ALLSBUS2018, which is interpreted as fav_stats(x = ~ep01, data = ALLBUS2018). In this case, x is the wrong type (a formula object) and data is passed as an additional argument via ... and subsequently ignored. Check the help via ?mosaic::favstats for more info on this.
This code should work
The names in your file are hard to read through the default read.table methods, so I've done that in a separate step.
require("mosaic")
csv_file <- ('"V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10"
"1" NA "za_nr" "doi" "version" "respid" "eastwest" "german" "ep01" "ep03" "ep04"
"2" 1 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "1" "1" "1" "1" "2" "2"
"3" 2 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "2" "2" "1" "2" "4" "3"
"4" 3 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "3" "1" "1" "2" "2" "3"
"5" 4 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "4" "2" "1" "2" "2" "3"
"6" 5 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "5" "2" "1" "3" "2" "3"
"7" 6 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "6" "1" "1" "1" "3" "3"
"8" 7 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "7" "1" "1" "3" "2" "3"
"9" 8 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "8" "1" "1" "2" "3" "3"
"10" 9 "5270" "doi:10.4232/1.13250" "2.0.0 (2019-03-26)" "9" "1" "1" "1" "2" "4"')
ALLBUS2018 <- read.table(text = csv_file, sep = " ") # <- for the purpose of this example
# ALLBUS2018 <- read.table(file = "ALLBUS2018.csv", sep = " ") <- what you should do
### Fix row & colnames
colnames(ALLBUS2018) <- ALLBUS2018[1,]
ALLBUS2018 <- ALLBUS2018[-1,]
rownames(ALLBUS2018) <- ALLBUS2018[,1]
ALLBUS2018 <- ALLBUS2018[,-1]
# This syntax is wrong:
try(mosaic::fav_stats(~ep01, data = ALLBUS2018, na.rm = TRUE))
#> Warning in mosaic::fav_stats(~ep01, data = ALLBUS2018, na.rm = TRUE): Auto-
#> converting formula to numeric.
#> Error in mosaic::fav_stats(~ep01, data = ALLBUS2018, na.rm = TRUE) :
#> 'language' object cannot be coerced to type 'double'
# This syntax is right:
mosaic::fav_stats(ALLBUS2018$ep01, na.rm = TRUE)
#> Warning in mosaic::fav_stats(ALLBUS2018$ep01, na.rm = TRUE): Auto-converting
#> character to numeric.
#> min Q1 median Q3 max mean sd n missing
#> 1 1 2 2 3 1.888889 0.781736 9 0
Created on 2021-01-23 by the reprex package (v0.3.0)

Convert a vector of integers to a vector of strings

toString seems to convert a whole vector to a single string -
toString(c(1,2))
[1] "1, 2"
how does one map the string conversion over each element; i.e. for the above example, to obtain ("1", "2") ?
> as.character(c(1,2))
[1] "1" "2"
Is the output I get from the R-console.
Since the result is a character vector with a single element, the strategy of using as.character will have no effect. Need to use scan:
> scan(text = toString(0:11), sep="," )
Read 12 items
[1] 0 1 2 3 4 5 6 7 8 9 10 11
Then you can use as.character if that is needed:
> res <- scan(text = toString(0:11), sep="," )
Read 12 items
> as.character(res)
[1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
I prefer paste0 since it's shorter and (from what I can tell) accomplishes the same thing as as.character:
> paste0(1:2)
[1] "1" "2"
> identical(paste0(1:2),as.character(1:2))
[1] TRUE

Iterating over characters of string R

Could somebody explain me why this does not print all the numbers separately in R.
numberstring <- "0123456789"
for (number in numberstring) {
print(number)
}
Aren't strings just arrays of chars? Whats the way to do it in R?
In R "0123456789" is a character vector of length 1.
If you want to iterate over the characters, you have to split the string into
a vector of single characters using strsplit.
numberstring <- "0123456789"
numberstring_split <- strsplit(numberstring, "")[[1]]
for (number in numberstring_split) {
print(number)
}
# [1] "0"
# [1] "1"
# [1] "2"
# [1] "3"
# [1] "4"
# [1] "5"
# [1] "6"
# [1] "7"
# [1] "8"
# [1] "9"
Just for fun, here are a few other ways to split a string at each character.
x <- "0123456789"
substring(x, 1:nchar(x), 1:nchar(x))
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
regmatches(x, gregexpr(".", x))[[1]]
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
scan(text = gsub("(.)", "\\1 ", x), what = character())
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
Possible with tidyverse::str_split
numberstring <- "0123456789"
str_split(numberstring,boundary("character"))
1. '0''1''2''3''4''5''6''7''8''9'
Here's a naive approach for iterating a string using a for loop and substring. This isn't any better than existing answers for the common case, but it might be useful if you want to break out of the loop early instead of always traversing the entire string once up front, as str_split/scan/substring(x, 1:nchar(x), 1:nchar(x))/regmatches requires.
s <- "0123456789"
if (s != "") {
for (i in 1:nchar(s)) {
print(substring(s, i, i))
}
}
The if is needed to avoid looping backwards from 1 to 0, inclusive of both ends.
Your question is not 100% clear as to the desired outcome (print each character individually from a string, or store each number in a way that the given print loop will result in each number being produced on its own line).
To store numberstring such that it prints using the loop you included:
numberstring<-c(0,1,2,3,4,5,6,7,8,9)
for(number in numberstring){print(number);}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
>

Resources