Flatten a named list in R - r

This is a very simple question, I can't believe I can't figure it out. I've searched high and low for a solution.
I have a named list, like so:
> fitted(mdl)
1 2 3 4 5 6 7 8
-424.8135 -395.0308 -436.5832 -414.3145 -382.9686 -380.7277 -394.2808 -394.3340
9 10 11 12 13 14 15 16
-401.6710 -386.6691 -407.4558 -427.4056 -397.4963 -415.6302 -436.1703 -378.4489
17 18 19 20 21 22 23 24
-353.7718 -377.3190 -390.5177 -370.3608 -389.7843 -397.8872 -401.9937 -390.4119
25 26 27 28 29 30 31 32
-387.4962 -422.4953 -427.1638 -402.5654 -409.6334 -360.7378 -355.1824 -370.9121
33 34 35 36 37 38 39 40
-377.6591 -373.3049 -388.4417 -398.1172 -357.1107 -376.8618 -378.7070 -420.5362
41 42 43 44 45 46 47 48
-390.8324 -406.5956 -403.1015 -363.5008 -347.2580 -371.0433 -376.4454 -360.3895
49
-383.9711
mdl is an object returned from lm(), and I'm trying to extract the predicted values using the extractor function fitted()
I would like this to be without the 1,2,3,... names. str() told me that names is an attribute. I can do
> names(fitted(mdl))
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30"
[31] "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45"
[46] "46" "47" "48" "49"
And that is what I want, except with the data. After trying various combinations of unlist,cbind/rbind, do.call, c(), etc. I finally figured out a solution:
> data.frame(fitted(mdl))$fitted.mdl
[1] -424.8135 -395.0308 -436.5832 -414.3145 -382.9686 -380.7277 -394.2808
[8] -394.3340 -401.6710 -386.6691 -407.4558 -427.4056 -397.4963 -415.6302
[15] -436.1703 -378.4489 -353.7718 -377.3190 -390.5177 -370.3608 -389.7843
[22] -397.8872 -401.9937 -390.4119 -387.4962 -422.4953 -427.1638 -402.5654
[29] -409.6334 -360.7378 -355.1824 -370.9121 -377.6591 -373.3049 -388.4417
[36] -398.1172 -357.1107 -376.8618 -378.7070 -420.5362 -390.8324 -406.5956
[43] -403.1015 -363.5008 -347.2580 -371.0433 -376.4454 -360.3895 -383.9711
But this is a very roundabout hack for something that must be right under my nose.
Any suggestions at what I'm missing?
(I don't know how to phrase the problem very well, or come up with a better title for the question, as I don't know the terminology to describe what I want. So feel free to edit :)

If you're just trying to remove the names of the object, just use unname.
Here's a basic example:
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
fitted(lm.D9)
# 1 2 3 4 5 6 7 8 9 10 11 12 13
# 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 4.661 4.661 4.661
# 14 15 16 17 18 19 20
# 4.661 4.661 4.661 4.661 4.661 4.661 4.661
Remove the names:
unname(fitted(lm.D9))
# [1] 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 5.032 4.661 4.661 4.661
# [14] 4.661 4.661 4.661 4.661 4.661 4.661 4.661

here is another simple way:
set.seed(100)
x <- rpois(5, 5)
y <- 2*x + rnorm(5)
mod <- lm(y ~ x)
fitted_ <- fitted(mod)
fitted_
1 2 3 4 5
# 7.822806 6.312569 9.333042 4.802333 9.333042
names(fitted_) <- NULL
fitted_
# [1] 7.822806 6.312569 9.333042 4.802333 9.333042

Related

How to split a string after the nth character in r

I am working with the following data:
District <- c("AR01", "AZ03", "AZ05", "AZ08", "CA01", "CA05", "CA11", "CA16", "CA18", "CA21")
I want to split the string after the second character and put them into two columns.
So that the data looks like this:
state district
AR 01
AZ 03
AZ 05
AZ 08
CA 01
CA 05
CA 11
CA 16
CA 18
CA 21
Is there a simple code to get this done? Thanks so much for you help
You can use substr if you always want to split by the second character.
District <- c("AR01", "AZ03", "AZ05", "AZ08", "CA01", "CA05", "CA11", "CA16", "CA18", "CA21")
#split district starting at the first and ending at the second
state <- substr(District,1,2)
#split district starting at the 3rd and ending at the 4th
district <- substr(District,3,4)
#put in data frame if needed.
st_dt <- data.frame(state = state, district = district, stringsAsFactors = FALSE)
you could use strcapture from base R:
strcapture("(\\w{2})(\\w{2})",District,
data.frame(state = character(),District = character()))
state District
1 AR 01
2 AZ 03
3 AZ 05
4 AZ 08
5 CA 01
6 CA 05
7 CA 11
8 CA 16
9 CA 18
10 CA 21
where \\w{2} means two words
The OP has written
I'm more familiar with strsplit(). But since there is nothing to split
on, its not applicable in this case
Au contraire! There is something to split on and it's called lookbehind:
strsplit(District, "(?<=[A-Z]{2})", perl = TRUE)
The lookbehind works like "inserting an invisible break" after 2 capital letters and splits the strings there.
The result is a list of vectors
[[1]]
[1] "AR" "01"
[[2]]
[1] "AZ" "03"
[[3]]
[1] "AZ" "05"
[[4]]
[1] "AZ" "08"
[[5]]
[1] "CA" "01"
[[6]]
[1] "CA" "05"
[[7]]
[1] "CA" "11"
[[8]]
[1] "CA" "16"
[[9]]
[1] "CA" "18"
[[10]]
[1] "CA" "21"
which can be turned into a matrix, e.g., by
do.call(rbind, strsplit(District, "(?<=[A-Z]{2})", perl = TRUE))
[,1] [,2]
[1,] "AR" "01"
[2,] "AZ" "03"
[3,] "AZ" "05"
[4,] "AZ" "08"
[5,] "CA" "01"
[6,] "CA" "05"
[7,] "CA" "11"
[8,] "CA" "16"
[9,] "CA" "18"
[10,] "CA" "21"
We can use str_match to capture first two characters and the remaining string in separate columns.
stringr::str_match(District, "(..)(.*)")[, -1]
# [,1] [,2]
# [1,] "AR" "01"
# [2,] "AZ" "03"
# [3,] "AZ" "05"
# [4,] "AZ" "08"
# [5,] "CA" "01"
# [6,] "CA" "05"
# [7,] "CA" "11"
# [8,] "CA" "16"
# [9,] "CA" "18"
#[10,] "CA" "21"
With the tidyverse this is very easy using the function separate from tidyr:
library(tidyverse)
District %>%
as.tibble() %>%
separate(value, c("state", "district"), sep = "(?<=[A-Z]{2})")
# A tibble: 10 × 2
state district
<chr> <chr>
1 AR 01
2 AZ 03
3 AZ 05
4 AZ 08
5 CA 01
6 CA 05
7 CA 11
8 CA 16
9 CA 18
10 CA 21
Treat it as fixed width file, and import:
# read fixed width file
read.fwf(textConnection(District), widths = c(2, 2), colClasses = "character")
# V1 V2
# 1 AR 01
# 2 AZ 03
# 3 AZ 05
# 4 AZ 08
# 5 CA 01
# 6 CA 05
# 7 CA 11
# 8 CA 16
# 9 CA 18
# 10 CA 21

data frame search == not finding all conditions that hold

I am trying to conditionally replace some fields in a dataframe; however, my code is finding about 25% of the actual instances present. I've searched through the other conditional search questions, but didn't find anything matching my problem -- I apologize in advance if I missed one.
Specifically, I am trying to replace all numbers 1 to 9 in dta$day, with a to i.
Here are the first 100 items in that vector: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9
When I conditionally search for values 1 to 9, using:
dta$day == c("1","2","3","4","5","6","7","8","9")
It states that only the first and last set in that grouping match my condition as below (I've bolded ~what should be TRUE for your reference):
[1] **TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE**
[33] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE FALSE**
[65] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **TRUE TRUE TRUE TRUE TRUE TRUE**
[97] **TRUE TRUE TRUE**
The problem must be in that first step, but to show you the result, only the first and last set in that first 100 in my vector are appropriately replaced after applying this code:
dta[dta$day == c("1","2","3","4","5","6","7","8","9"),1
] <- c("a", "b", "c", "d", "e", "f", "g", "h", "i")
[1] **"a" "b" "c" "d" "e" "f" "g" "h" "i"** "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[20] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"1" "2" "3" "4" "5" "6" "7"**
[39] "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
[58] "27" "28" **"1" "2" "3" "4" "5" "6" "7" "8" "9" "10"** "11" "12" "13" "14" "15" "16" "17"
[77] "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"a" "b" "c" "d" "e"
[96] "f" "g" "h" "i"**
If useful, here is the initial state of that vector:
is.numeric(dta$day)
[1] TRUE
summary(dta$day)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 8.00 16.00 15.73 23.00 31.00
I am reproducing the data frame here:
day <- c(1:31,1:28,1:31,1:30)
month <- c(rep_len(1,31),rep_len(2,28),rep_len(3,31),rep_len(4,30))
temp <- rnorm(length(month),10,10)
dta=as.data.frame(cbind(day,month,temp))
And actually, although I am able to reproduce the problem with this toy example, I get a warning that I do not get with my actual data (not reproduced here because it is very large): "longer object length is not a multiple of shorter object length".
I would love some help, and if I haven't provided something or haven't done so in the format needed, please kindly let me know!
It looks like you're checking equivalence to a vector, rather than it's components. Try %in% instead, like this:
dta[dta$day %in% c("1","2","3","4","5","6","7","8","9"), ]
Use %in% rather than == and then index your data frame/vector as below to replace 1:9 with a:i as wanted:
y <- c(1:9)
dta$day[dta$day %in% y] <- letters[1:length(y)]
Read more about the different behaviours of these operators here:
Difference between the == and %in% operators in R
And
Difference between `%in%` and `==`

Creating vectors from regular expressions in a column name

I have a dataframe, in which the columns represent species. The species affilation is encoded in the column name's suffix:
Ac_1234_AnyString
The string after the second underscore (_) represents the species affiliation.
I want to plot some networks based on rank correlations, and i want to color the species according to their species affiliation, later when i create fruchtermann-rheingold graphs with library(qgraph).
Ive done it previously by sorting the df by the name_suffix and then create vectors by manually counting them:
list.names <- c("SG01", "SG02")
list <- vector("list", length(list.names))
names(list) <- list.names
list$SG01 <- c(1:12)
list$SG02 <- c(13:25)
str(list)
List of 2
$ SG01 : int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
$ SG02 : int [1:13] 13 14 15 16 17 18 19 20 21 22 ...
This was very tedious for the big datasets i am working with.
Question is, how can i avoid the manual sorting and counting, and extract vectors (or a list) according to the suffix and the position in the dataframe. I know i can create a vector with the suffix information by
indx <- gsub(".*_", "", names(my_data))
str(indx)
chr [1:29]
"4" "6" "6" "6" "6" "6" "11" "6" "6" "6" "6" "6" "3" "18" "6" "6" "6" "5" "5"
"6" "3" "6" "3" "6" "NA" "6" "5" "4" "11"
Now i would need to create vectors with the position of all "4"s, "6"s and so on:
List of 7
$ 4: int[1:2] 1 28
$ 6: int[1:17] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
$ 11: int[1:2] 7 29
....
Thank you.
you can try:
sapply(unique(indx), function(x, vec) which(vec==x), vec=indx)
# $`4`
# [1] 1 28
# $`6`
# [1] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
# $`11`
# [1] 7 29
# $`3`
# [1] 13 21 23
# $`18`
# [1] 14
# $`5`
# [1] 18 19 27
# $`NA`
# [1] 25
Another option is
setNames(split(seq_along(indx),match(indx, unique(indx))), unique(indx))

Access the levels of a factor in R

I have a 5-level factor that looks like the following:
tmp
[1] NA
[2] 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
[3] NA
[4] NA
[5] 5,9,16,24,35,36,42
[6] 4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
[7] 8,39
5 Levels: 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46 ...
I want to access the items within each level except NA. So I use the levels() function, which gives me:
> levels(tmp)
[1] "1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46"
[2] "4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50"
[3] "5,9,16,24,35,36,42"
[4] "8,39"
[5] "NA"
Then I would like to access the elements in each level, and store them as numbers. However, for example,
>as.numeric(cat(levels(tmp)[3]))
5,9,16,24,35,36,42numeric(0)
Can you help me removing the commas within the numbers and the numeric(0) at the very end. I would like to have a vector of numerics 5, 9, 16, 24, 35, 36, 42 so that I can use them as indices to access a data frame. Thanks!
You need to use a combination of unlist, strsplit and unique.
First, recreate your data:
dat <- read.table(text="
NA
1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46
NA
NA
5,9,16,24,35,36,42
4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
8,39")$V1
Next, find all the unique levels, after using strsplit:
sort(unique(unlist(
sapply(levels(dat), function(x)unlist(strsplit(x, split=",")))
)))
[1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "23" "24" "25" "26"
[20] "27" "28" "29" "3" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "4" "40" "41" "42" "43"
[39] "44" "45" "46" "47" "48" "49" "5" "50" "6" "7" "8" "9"
Does this do what you want?
levels_split <- strsplit(levels(tmp), ",")
lapply(levels_split, as.numeric)
Using Andrie's dat
val <- scan(text=levels(dat),sep=",")
#Read 50 items
split(val,cumsum(c(T,diff(val) <0)))
#$`1`
#[1] 1 2 3 6 11 12 13 18 20 21 22 26 29 33 40 43 46
#$`2`
#[1] 4 7 10 14 15 17 19 23 25 27 28 30 31 32 34 37 38 41 44 45 47 48 49 50
#$`3`
#[1] 5 9 16 24 35 36 42
#$`4`
#[1] 8 39

reformatting data frame with List in R

Helo, I am trying to reshape a data.frame in R such that each row will repeat with a different value from a list, then the next row will repeat from a differing value from the second entry of the list.
the list is called, wrk, dfx is the dataframe I want to reshape, and listOut is what I want to end up with.
Thank you very much for your help.
> wrk
[[1]]
[1] "41" "42" "44" "45" "97" "99" "100" "101" "102"
[10] "103" "105" "123" "124" "126" "127" "130" "132" "135"
[19] "136" "137" "138" "139" "140" "141" "158" "159" "160"
[28] "161" "162" "163" "221" "223" "224" ""
[[2]]
[1] "41" "42" "44" "45" "98" "99" "100" "101" "102"
[10] "103" "105" "123" "124" "126" "127" "130" "132" "135"
[19] "136" "137" "138" "139" "140" "141" "158" "159" "160"
[28] "161" "162" "163" "221" "223" "224" ""
>dfx
projectScore highestRankingGroup
1 0.8852 1
2 0.8845 2
>listOut
projectScore highestRankingGroup wrk
1 0.8852 1 41
2 0.8852 1 42
3 0.8852 1 44
4 0.8852 1 45
5 0.8852 1 97
6 0.8852 1 99
7 0.8852 1 100
8 0.8852 1 101
...
35 0.8845 2 41
36 0.8845 2 42
37 0.8845 2 44
38 0.8845 2 45
39 0.8845 2 98
40 0.8845 2 99
41 0.8845 2 100
How about replicate rows of dfx and cbind with unlisted wrk:
listOut <- cbind(
dfx[rep(seq_along(wrk), sapply(wrk, length)), ],
wrk = unlist(wrk)
)
How about:
If wrk contains simple vectors like in your example:
> szs<-sapply(wrk, length)
> fulldfr<-do.call(c, wrk)
> listOut<-cbind(dfx[rep(seq_along(szs), szs),], fulldfr)
If wrk contains dataframes:
> szs<-sapply(wrk, function(dfr){dim(dfr)[1]})
> fulldfr<-do.call(rbind, wrk)
> listOut<-cbind(dfx[rep(seq_along(szs), szs),], fulldfr)
How about:
expand.grid(dfx$projectScore, dfx$highestRankingGroup, wrk[[1]])
Edit:
Maybe you can eleborate a bit more, because this does seem to work:
a <- c("41","42","44","45","97","99","100","101","102","103","105", "123","124","126","127","130","132","135","136","137","138","139","140","141","158","159","160","161","162","163","221","223","224")
wrk <-list(a, a)
dfx <- data.frame(projectScore=c(0.8852, 0.8845), highestRankingGroup=c(1,2))
listOut <- expand.grid(dfx$projectScore, dfx$highestRankingGroup, wrk[[1]])
names(listOut) <- c("projectScore", "highestRankingGroup", "wrk")
listOut[order(-listOut$projectScore,listOut$highestRankingGroup, listOut$wrk),]

Resources