pivot_longer with names_pattern [duplicate] - r

This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 1 year ago.
I am quite new to the whole programing stuff, but i need to skript reproducable for large datasets. I hope I provided a sufficient example.
I have a dataframe like this (with 8 more "Nutrients" and 5 more "trade-elements" and much more Years):
Year<-c(1961,1962)
Total_Energy_kcal_Production<-c(5,8)
Total_Energy_kcal_Import<-c(6,1)
Total_Ca_g_Production<-c(3,4)
Total_Ca_g_Import<-c(3,8)
df<-cbind(Year,Total_Energy_kcal_Production, Total_Energy_kcal_Import, Total_Ca_g_Production, Total_Ca_g_Import)
looks like:
Year Total_Energy_kcal_Production Total_Energy_kcal_Import Total_Ca_g_Production Total_Ca_g_Import
1961 5 6 3 3
1962 8 1 4 8
and I want it to look like this:
Year Nutrient Production Import
1961 Total_Energy_kcal 5 6
1962 Total_Energy_kcal 8 1
1961 Total_Ca_g 3 3
1962 Total_Ca_g 4 8
I tried a lot with pivot_longer and names_patern. I thought this would work, but I do not fully understand the arguments:
df_piv<-df%>%
pivot_longer(cols = -Year, names_to = "Nutrient",
names_pattern = ".*(?=_)")
I get an error-message that i can not interprete:
Error: Can't select within an unnamed vector.

You can provide names_pattern regex as :
tidyr::pivot_longer(df,
cols = -Year,
names_to = c('Nutrient', '.value'),
names_pattern = '(.*)_(\\w+)')
# Year Nutrient Production Import
# <dbl> <chr> <dbl> <dbl>
#1 1961 Total_Energy_kcal 5 6
#2 1961 Total_Ca_g 3 3
#3 1962 Total_Energy_kcal 8 1
#4 1962 Total_Ca_g 4 8
This will put everything until the last underscore in Nutrient column and the remaining data is kept as column name.
data
cbind will create a matrix, use data.frame to create data.
df<-data.frame(Year,Total_Energy_kcal_Production,Total_Energy_kcal_Import,
Total_Ca_g_Production, Total_Ca_g_Import)

Related

Transforming big dataframe in more sensible form [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Reshaping wide to long with multiple values columns [duplicate]
(5 answers)
Closed 1 year ago.
Dataframe consist of 3 rows: wine_id, taste_group and and evaluated matching score for each of that group:
wine_id
taste_group
score
22
tree_fruit
87
22
citrus_fruit
98
22
tropical_fruit
17
22
earth
8
22
microbio
6
22
oak
7
22
vegetal
1
How to achieve to make a separate column for each taste_group and to list scores in rows?
Hence this:
wine_id
tree_fruit
citrus_fruit
tropical_fruit
earth
microbio
oak
vegetal
22
87
98
17
8
6
7
1
There are 13 taste groups overall, along with more than 6000 Wines.
If the wine doesn't have a score for taste_group row takes value 0.
I used
length(unique(tastes$Group))
length(unique(tastes$Wine_Id))
in R to question basic measures.
How to proceed to wanted format?
Assuming your dataframe is named tastes, you'll want something like:
library(tidyr)
tastes %>%
# Get into desired wide format
pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)
In R, this is called as the long-to-wide reshaping, you can also use dcast to do that.
library(data.table)
dt <- fread("
wine_id taste_group score
22 tree_fruit 87
22 citrus_fruit 98
22 tropical_fruit 17
22 earth 8
22 microbio 6
22 oak 7
22 vegetal 1
")
dcast(dt, wine_id ~ taste_group, value.var = "score")
#wine_id citrus_fruit earth microbio oak tree_fruit tropical_fruit vegetal
# <int> <int> <int> <int> <int> <int> <int> <int>
# 22 98 8 6 7 87 17 1
Consider reshape:
wide_df <- reshape(
my_data,
timevar="taste_group",
v.names = "score",
idvar = "wine_id",
direction = "wide"
)

How to add rows to dataframe R with rbind

I know this is a classic question and there are also similar ones in the archive, but I feel like the answers did not really apply to this case. Basically I want to take one dataframe (covid cases in Berlin per district), calculate the sum of the columns and create a new dataframe with a column representing the name of the district and another one representing the total number. So I wrote
covid_bln <- read.csv('https://www.berlin.de/lageso/gesundheit/infektionsepidemiologie-infektionsschutz/corona/tabelle-bezirke-gesamtuebersicht/index.php/index/all.csv?q=', sep=';')
c_tot<-data.frame('district'=c(), 'number'=c())
for (n in colnames(covid_bln[3:14])){
x<-data.frame('district'=c(n), 'number'=c(sum(covid_bln$n)))
c_tot<-rbind(c_tot, x)
next
}
print(c_tot)
Which works properly with the names but returns only the number of cases for the 8th district, but for all the districts. If you have any suggestion, even involving the use of other functions, it would be great. Thank you
Here's a base R solution:
number <- colSums(covid_bln[3:14])
district <- names(covid_bln[3:14])
c_tot <- cbind.data.frame(district, number)
rownames(c_tot) <- NULL
# If you don't want rownames:
rownames(c_tot) <- NULL
This gives us:
district number
1 mitte 16030
2 friedrichshain_kreuzberg 10679
3 pankow 10849
4 charlottenburg_wilmersdorf 10664
5 spandau 9450
6 steglitz_zehlendorf 9218
7 tempelhof_schoeneberg 12624
8 neukoelln 14922
9 treptow_koepenick 6760
10 marzahn_hellersdorf 6960
11 lichtenberg 7601
12 reinickendorf 9752
I want to provide a solution using tidyverse.
The final result is ordered alphabetically by districts
c_tot <- covid_bln %>%
select( mitte:reinickendorf) %>%
gather(district, number, mitte:reinickendorf) %>%
group_by(district) %>%
summarise(number = sum(number))
The rusult is
# A tibble: 12 x 2
district number
* <chr> <int>
1 charlottenburg_wilmersdorf 10736
2 friedrichshain_kreuzberg 10698
3 lichtenberg 7644
4 marzahn_hellersdorf 7000
5 mitte 16064
6 neukoelln 14982
7 pankow 10885
8 reinickendorf 9784
9 spandau 9486
10 steglitz_zehlendorf 9236
11 tempelhof_schoeneberg 12656
12 treptow_koepenick 6788

Function for dataset manipulation in R: convert from 1 column to 2columns [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 2 years ago.
What function I should use if I want to convert from data like this
Data like this
Try reshaping and please next time include data using dput() not screenshots:
library(dplyr)
library(tidyr)
#Data
df <- data.frame(Country=rep(c('Spain','Croatia'),5),
quarter=as.Date(c('2017-03-31','2017-03-31',
'2017-06-30','2017-06-30',
'2017-09-30','2017-09-30',
'2017-12-31','2017-12-31',
'2018-03-31','2018-03-31')),
employment_index=runif(10,98,107))
#Code
new <- df %>% pivot_wider(names_from = Country,values_from=employment_index)
Output:
# A tibble: 5 x 3
quarter Spain Croatia
<date> <dbl> <dbl>
1 2017-03-31 99.8 99.1
2 2017-06-30 103. 105.
3 2017-09-30 98.7 103.
4 2017-12-31 103. 104.
5 2018-03-31 100. 99.9

Using custom order to arrange rows after previous sorting with arrange

I know this has already been asked, but I think my issue is a bit different (nevermind if it is in Portuguese).
I have this dataset:
df <- cbind(c(rep(2012,6),rep(2016,6)),
rep(c('Emp.total',
'Fisicas.total',
'Outros,total',
'Politicos.total',
'Receitas.total',
'Proprio.total'),2),
runif(12,0,1))
colnames(df) <- c('Year,'Variable','Value)
I want to order the rows to group first everything that has the same year. Afterwards, I want the Variable column to be ordered like this:
Receitas.total
Fisicas.total
Emp.total
Politicos.total
Proprio.total
Outros.total
I know I could usearrange() from dplyr to sort by the year. However, I do not know how to combine this with any routine using factor and order without messing up the previous ordering by year.
Any help? Thank you
We create a custom order by converting the 'Variable' into factor with levels specified in the custom order
library(dplyr)
df %>%
arrange(Year, factor(Variable, levels = c('Receitas.total',
'Fisicas.total', 'Emp.total', 'Politicos.total',
'Proprio.total', 'Outros.total')))
# A tibble: 12 x 3
# Year Variable Value
# <dbl> <chr> <dbl>
# 1 2012 Receitas.total 0.6626196
# 2 2012 Fisicas.total 0.2248911
# 3 2012 Emp.total 0.2925740
# 4 2012 Politicos.total 0.5188971
# 5 2012 Proprio.total 0.9204438
# 6 2012 Outros,total 0.7042230
# 7 2016 Receitas.total 0.6048889
# 8 2016 Fisicas.total 0.7638205
# 9 2016 Emp.total 0.2797356
#10 2016 Politicos.total 0.2547251
#11 2016 Proprio.total 0.3707349
#12 2016 Outros,total 0.8016306
data
set.seed(24)
df <- data_frame(Year =c(rep(2012,6),rep(2016,6)),
Variable = rep(c('Emp.total',
'Fisicas.total',
'Outros,total',
'Politicos.total',
'Receitas.total',
'Proprio.total'),2),
Value = runif(12,0,1))

Reshape data from wide to long? [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 6 years ago.
How do I reshape this wide data: (from a csv file)
Name Code Indicator 1960 1961 1962
Into this long format?
Name Code Indicator Year
the reshape2 package does this nicely with the function melt.
yourdata_melted <- melt(yourdata, id.vars=c('Name', 'Code', 'Indicator'), variable.name='Year')
This will add a column of value that you can drop. yourdata_melted$value <- NULL
And just because I like to continue my campaign for using base R functions:
Test data:
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")
test
name code indicator 1960 1961 1962
1 1 3 5 7 9 11
2 2 4 6 8 10 12
Now reshape it!
reshape(
test,
idvar=c("name","code","indicator"),
varying=c("1960","1961","1962"),
timevar="year",
v.names="value",
times=c("1960","1961","1962"),
direction="long"
)
# name code indicator year value
#1.3.5.1960 1 3 5 1960 7
#2.4.6.1960 2 4 6 1960 8
#1.3.5.1961 1 3 5 1961 9
#2.4.6.1961 2 4 6 1961 10
#1.3.5.1962 1 3 5 1962 11
#2.4.6.1962 2 4 6 1962 12
With tidyr
gather(test, "time", "value", 4:6)
Data
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")

Resources