python: count values of dictionary - counting

I'm working with python and I need to find in a dictionary how many values each key has.
this is my dictionary:
{2: [(1, 1)], 3: [(2, 1), (2, 1)], 4: [(1, 3), (3, 1), (3, 1)], 5: [(1, 4), (2, 3), (2, 3), (4, 1)], 6: [(1, 5), (2, 4), (2, 4), (3, 3), (3, 3)], 7: [(1, 6), (2, 5), (2, 5), (3, 4), (3, 4), (4, 3)], 8: [(2, 6), (2, 6), (3, 5), (3, 5), (4, 4)], 9: [(1, 8), (3, 6), (3, 6), (4, 5)], 10: [(2, 8), (2, 8), (4, 6)], 11: [(3, 8), (3, 8)], 12: [(4, 8)]}
I need to find how many values there are for each key:
so it will be like:
{2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 7: 6, 8: 5, 9: 4, 10: 3, 11: 2, 12: 1}
I've tried to make it work for a long time
thanks in advance

Not sure if this is what you're looking for?
def getcountvalue(d):
newdict = {}
for key, value in d.items():
newdict[key] = len(value)
return newdict
d = {2: [(1, 1)], 3: [(2, 1), (2, 1)], 4: [(1, 3), (3, 1), (3, 1)], 5: [(1, 4), (2, 3), (2, 3), (4, 1)], 6: [(1, 5), (2, 4), (2, 4), (3, 3), (3, 3)], 7: [(1, 6), (2, 5), (2, 5), (3, 4), (3, 4), (4, 3)], 8: [(2, 6), (2, 6), (3, 5), (3, 5), (4, 4)], 9: [(1, 8), (3, 6), (3, 6), (4, 5)], 10: [(2, 8), (2, 8), (4, 6)], 11: [(3, 8), (3, 8)], 12: [(4, 8)]}
newd = getcountvalue(d)

Related

Removing a row based on a condition

my_df <- tibble(
b1 = c(2, 1, 1, 2, 2, 2, 1, 1, 2),
b2 = c(NA, 4, 6, 2, 6, 6, 1, 1, 7),
b3 = c(5, 9, 8, NA, 2, 3, 9, 5, NA),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, NA, 9, 5, 1, 1, 2, NA),
b10 = c(14, 3, NA, 2, 2, 2, 3, NA, 5))
I have a df like this, and would like to tell R to remove all '3' or 'NA' in b10 if b1 = 1. I have tried this with this, but it seems to keep the '3' and 'NA' instead of removing them;
new_df <- my_df %>% filter(is.na(b10) | b10 == 3 | b1==1 & b10 ==NA)
This should do the trick:
library(dplyr)
my_df %>%
filter(!(b1 == 1 & (is.na(b10) | b10 == 3)))
Edit: assuming you want to remove the rows where the conditions meet

R reshape wide to long: multiple variables, observations with multiple indicies

I have got some data containing observations with multiple idicies $y_{ibc}$ stored in a messy wide format. I have been fiddling around with tidyr and reshape2 but could not figure it out (reshaping really is my nemesis).
Here is an example:
df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a1b1c1 = c(5,
2, 1, 4, 3, 1, 0, 1, 3), a2b1c1 = c(3, 4, 1, 1, 3, 2, 1, 4, 4
), a3b1c1 = c(4, 0, 0, 1, 1, 1, 0, 0, 1), a1b2c1 = c(1, 0, 4,
2, 4, 1, 0, 4, 2), a2b2c1 = c(2, 0, 1, 0, 1, 0, 3, 2, 0), a3b2c1 = c(2,
4, 3, 0, 2, 3, 3, 3, 4), yc1 = c(1, 2, 2, 1, 2, 2, 2, 1, 1), a1b1c2 = c(4,
2, 3, 0, 4, 4, 2, 1, 4), a2b1c2 = c(3, 0, 3, 3, 4, 4, 3, 2, 2
), a3b1c2 = c(3, 1, 0, 1, 4, 0, 2, 2, 3), a1b2c2 = c(2, 2, 0,
3, 2, 1, 4, 1, 0), a2b2c2 = c(3, 0, 2, 3, 4, 4, 4, 0, 4), a3b2c2 = c(0,
0, 0, 2, 0, 0, 1, 4, 3), yc2 = c(2, 2, 2, 1, 2, 2, 2, 1, 1), X = c(5,
6, 3, 7, 4, 3, 2, 3, 2)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
This is what I want (excerpt):
id b c y a1 a2 a3 X
1 1 b1 c1 1 5 3 4 5
2 1 b2 c1 1 1 2 2 5
3 1 b1 c2 2 4 3 3 5
4 1 b2 c2 2 2 3 0 5
Using tidyr & dplyr:
library(tidyverse)
df %>%
pivot_longer(cols = matches("a.b.c."), names_to = "name", values_to = "value") %>%
separate(name, into = c("a", "b", "c"), sep = c(2,4)) %>%
mutate(y = case_when(c == "c1" ~ yc1,
c == "c2" ~ yc2)) %>%
pivot_wider(names_from = a, values_from = value) %>%
select(id, b, c, y, a1, a2, a3, X)
First, convert all your a/b/c columns to a long format & separate the 3 values into separate columns. Then combine your y columns into one depending on the value of c using mutate andcase_when (you could also use if_else for two options but case_when is more expandable for more values). Then pivot your a columns back to wide format and use select to put them in the right order and get rid of the yc1 and yc2 columns.

Creating a new variable based on numeric differences between two other variables in r

Here's an example dataset.
structure(list(vector1 = c(1, 4, 4, 2, 1, 3, 2, 3, 4, 5, 3, 5,
5, 1, 4, 2, 4, 5, 2, 5), vector2 = c(4, 2, 3, 5, 3, 5, 2, 2,
3, 3, 4, 1, 4, 1, 2, 1, 2, 1, 1, 2)), class = "data.frame", row.names = c(NA,
-20L))
Basically what I'm trying to do is create a new variable 'Direction' based on differences between these numbers. I want to say something like:
if vector2 == vector1 or vector2 == vector1 +/- 1 than Direction == 'NS'
if vector2 < vector1 -1 or if vector 2 > vector1 + 1 than Direction == 'EW'
Hopefully this makes sense. Thanks!
A similar solution is this (slightly simpler):
Data:
df <- data.frame(
vector1 = c(1, 4, 4, 2, 1, 3, 2, 3, 4, 5, 3, 5, 5, 1, 4, 2, 4, 5, 2, 5),
vector2 = c(4, 2, 3, 5, 3, 5, 2, 2, 3, 3, 4, 1, 4, 1, 2, 1, 2, 1, 1, 2)
)
Desired new column:
df$direction <- ifelse(df$vector1==vector2 |
df$vector1==vector2 + 1 |
df$vector1==vector2 - 1, "NS","EW")
Outcome:
df
vector1 vector2 direction
1 1 4 EW
2 4 2 EW
3 4 3 NS
4 2 5 EW
5 1 3 EW
6 3 5 EW
7 2 2 NS
8 3 2 NS
9 4 3 NS
10 5 3 EW
11 3 4 NS
12 5 1 EW
13 5 4 NS
14 1 1 NS
15 4 2 EW
16 2 1 NS
17 4 2 EW
18 5 1 EW
19 2 1 NS
20 5 2 EW
you can try this
df <- structure(list(vector1 = c(1, 4, 4, 2, 1, 3, 2, 3, 4, 5, 3, 5,
5, 1, 4, 2, 4, 5, 2, 5), vector2 = c(4, 2, 3, 5, 3, 5, 2, 2,
3, 3, 4, 1, 4, 1, 2, 1, 2, 1, 1, 2)), class = "data.frame", row.names = c(NA,
-20L))
df$direction <- with(df,ifelse((vector2 == vector1) | (vector2 == (vector1 + 1)) | (vector2 == (vector1 - 1)), "NS",
ifelse(vector2 < (vector1-1) | (vector2 > (vector1 + 1)),"EW", NA)))

When mutate_all and lapply disagree ... How to replace lapply with mutate_all

I'm here again to ask for your help!
I'm trying to figure out what's happening with mutate_all (or with me...).
Let's say I have this dataset:
ds <- structure(list(Q1 = structure(c(5, 4, 5, 5, 5, 5, 5, 5, 5, 5,
5, 4, 3, 5, 5, 5, 5, 5, 1, 4, 5, 5, 3, 4, 5, 5, 5, 5, 5, 2, 5,
5, 4, 5, 5, 3, 5, 5, 4, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4,
5, 4), label = "1 Para mim é igual se os meus amigos são heterossexuais ou homossexuais.", format.spss = "F1.0", display_width = 3L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q2 = structure(c(1, 1, 1, 1, 1, 1, 3, 1, 2, 3, 1, 4, 4, 4,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2), label = "A homossexualidade é uma perturbação psicológica/biológica.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q3 = structure(c(5, 2, 5, 4, 5, 4, 5, 5, 5, 4, 5, 5, 2, 3,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 4, 5, 4), label = "Acredito que os pais e as mães homossexuais são tão competentes como os pais e mães heterossexuais.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q4 = structure(c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2,
1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 5, 1, 1, 2, 1, 3), label = "4 Todas as Lésbicas, Gays, Bissexuais, Transexuais, Transgêneros e Intersexuais (LGBTI) me deixam irritado.", format.spss = "F1.0", display_width = 4L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q5 = structure(c(1, 4, 1, 1, 1, 1, 3, 1, 2, 1, 1, 1, 3, 3,
1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 3, 2,
1, 1, 1, 2, 2, 5, 1, 4, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3), label = "A legalização do casamento entre pessoas do mesmo sexo é muito errada.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
))), row.names = c(NA, -54L), class = c("tbl_df", "tbl", "data.frame"
))
Then I need to transform all variables into factors to plot them. I really like the dplyr approach:
ds_mutate <- ds %>% mutate_all(., factor, levels=1:5)
likert(ds_mutate)
But this error is coming up:
Error in likert(ds_mutate) :
All items (columns) must have the same number of levels
When I use lapply (Nobody will convince me 'apply'functions are intuitive...), it works pretty well:
> ds_apply <- lapply(ds, factor, levels=1:5) %>% as.data.frame()
> likert(ds_apply)
Item 1 2 3 4 5
1 Q1 1.851852 1.851852 9.259259 14.814815 72.222222
2 Q2 77.777778 9.259259 5.555556 7.407407 0.000000
3 Q3 0.000000 3.703704 1.851852 14.814815 79.629630
4 Q4 79.629630 14.814815 3.703704 0.000000 1.851852
5 Q5 72.222222 7.407407 14.814815 3.703704 1.851852
But as you can see, the str is (for me) the same...
i'm looking forward to hearing from you!!
Thank you!
There is one difference:
class(ds_mutate)
# [1] "tbl_df" "tbl" "data.frame"
class(ds_apply)
# [1] "data.frame"
The issue then arises from the fact that, in the call of likert, we have
nlevels = length(levels(items[, 1]))
where, in the former case,
length(levels(ds_mutate[, 1]))
# [1] 0
since
ds_mutate[, 1]
# A tibble: 54 x 1
# Q1
# <fct>
# 1 5
# 2 4
# 3 5
# 4 5
# 5 5
# 6 5
# 7 5
# 8 5
# 9 5
# 10 5
# … with 44 more rows
i.e., the result is a tibble. Also,
methods("levels")
# [1] levels.default
so that there is no levels method for tibbles. Notice also that
class(ds_mutate) <- c("data.frame", "tbl_df", "tbl")
ds_mutate[, 1]
# [1] 5 4 5 5 5 5 5 5 5 5 5 4 3 5 5 5 5 5 1 4 5 5 3 4 5 5 5 5 5 2 5 5 4 5 5 3 5 5 4 3 3 5 5 5
# [45] 5 5 5 5 5 5 5 4 5 4
# Levels: 1 2 3 4 5
in which case
likert(ds_mutate)
starts to work too. Without modifying classes you may also use
likert(data.frame(ds_mutate))
Extra: lapply in
lapply(ds, factor, levels = 1:5)
actually is really intuitive once we understand one thing: a data frame is a special case of a list where each list element is of the same length. Know the way sapply or lapply works is that it goes over each element of the first argument: once we see ds as a data frame whose elements (since it's a list) are columns, it becomes clear how it operates. For the same reason, since the results of factor in this case are of the same length, the list resulting from the call to lapply nicely can be converted to a data frame.
I never used likert package but it looks like it doesn't take an object of the class tibble. This works for me:
likert(as.data.frame(ds_mutate))

Reorganize a List with Lists in a new list in R

This is my list:
mylist=list(list(a = c(2, 3, 4, 5), b = c(3, 4, 5, 5), c = c(3, 7, 5,
5), d = c(3, 4, 9, 5), e = c(3, 4, 5, 9), f = c(3, 4, 1, 9),
g = c(3, 1, 5, 9), h = c(3, 3, 5, 9), i = c(3, 17, 3, 9),
j = c(3, 17, 3, 9)), list(a = c(2, 5, 48, 4), b = c(7, 4,
5, 5), c = c(3, 7, 35, 5), d = c(3, 843, 9, 5), e = c(3, 43,
5, 9), f = c(3, 4, 31, 39), g = c(3, 1, 5, 9), h = c(3, 3, 5,
9), i = c(3, 17, 3, 9), j = c(3, 17, 3, 9)), list(a = c(2, 3,
4, 35), b = c(3, 34, 5, 5), c = c(3, 37, 5, 5), d = c(38, 4,
39, 5), e = c(3, 34, 5, 9), f = c(33, 4, 1, 9), g = c(3, 1, 5,
9), h = c(3, 3, 35, 9), i = c(3, 17, 33, 9), j = c(3, 137, 3,
9)), list(a = c(23, 3, 4, 85), b = c(3, 4, 53, 5), c = c(3, 7,
5, 5), d = c(3, 4, 9, 5), e = c(3, 4, 5, 9), f = c(3, 34, 1,
9), g = c(38, 1, 5, 9), h = c(3, 3, 5, 9), i = c(3, 137, 3, 9
), j = c(3, 17, 3, 9)), list(a = c(2, 3, 48, 5), b = c(3, 4,
5, 53), c = c(3, 73, 53, 5), d = c(3, 43, 9, 5), e = c(33, 4,
5, 9), f = c(33, 4, 13, 9), g = c(3, 81, 5, 9), h = c(3, 3, 5,
9), i = c(3, 137, 3, 9), j = c(3, 173, 3, 9)))
As you can see my list has 5 entries. Each entry has 10 others entries filled by 4 elements.
> mylist[[4]][[1]]
[1] 23 3 4 85
I want to create another list with only one entry.
All want to put all entr of tipe mylist[[i]][[1]] in first position of a new list: mynewlist[[1]][[1]] will be filled by the mylist[[1]][[1]],mylist[[2]][[1]],mylist[[3]][[1]],mylist[[4]][[1]],mylist[[5]][[1]] elements.
The secon position of mynewlist (mynewlist[[2]][[1]]) will be: mylist[[1]][[2]],mylist[[2]][[2]],mylist[[3]][[2]],mylist[[4]][[2]],mylist[[5]][[2]] elements.
Until
The fith position of mynewlist (mynewlist[[5]][[1]]) will be: mylist[[1]][[5]],mylist[[2]][[5]],mylist[[3]][[5]],mylist[[4]][[5]],mylist[[5]][[5]] elements.
In other words, I want to put every mylist[[i]][[1]]$a in the mynewlist[[1]][[1]] position; the mylist[[i]][[1]]$b in the mynewlist[[1]][[2]] position and so on until mylist[[i]][[1]]$j in the mynewlist[[1]][[10]]
This should be my output for the first position of mynewlist:
#[[1]]
#[1] 2 3 4 5
2 5 48 4
2 3 4 35
23 3 4 85
2 3 48 5
Any help?
We can use transpose
library(dplyr)
out <- mylist %>%
purrr::transpose(.)
out[[1]]
#[[1]]
#[1] 2 3 4 5
#[[2]]
#[1] 2 5 48 4
#[[3]]
#[1] 2 3 4 35
#[[4]]
#[1] 23 3 4 85
#[[5]]
#[1] 2 3 48 5

Resources