Reshape data from wide to long? [duplicate] - r

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 6 years ago.
How do I reshape this wide data: (from a csv file)
Name Code Indicator 1960 1961 1962
Into this long format?
Name Code Indicator Year

the reshape2 package does this nicely with the function melt.
yourdata_melted <- melt(yourdata, id.vars=c('Name', 'Code', 'Indicator'), variable.name='Year')
This will add a column of value that you can drop. yourdata_melted$value <- NULL

And just because I like to continue my campaign for using base R functions:
Test data:
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")
test
name code indicator 1960 1961 1962
1 1 3 5 7 9 11
2 2 4 6 8 10 12
Now reshape it!
reshape(
test,
idvar=c("name","code","indicator"),
varying=c("1960","1961","1962"),
timevar="year",
v.names="value",
times=c("1960","1961","1962"),
direction="long"
)
# name code indicator year value
#1.3.5.1960 1 3 5 1960 7
#2.4.6.1960 2 4 6 1960 8
#1.3.5.1961 1 3 5 1961 9
#2.4.6.1961 2 4 6 1961 10
#1.3.5.1962 1 3 5 1962 11
#2.4.6.1962 2 4 6 1962 12

With tidyr
gather(test, "time", "value", 4:6)
Data
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")

Related

pivot_longer with names_pattern [duplicate]

This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 1 year ago.
I am quite new to the whole programing stuff, but i need to skript reproducable for large datasets. I hope I provided a sufficient example.
I have a dataframe like this (with 8 more "Nutrients" and 5 more "trade-elements" and much more Years):
Year<-c(1961,1962)
Total_Energy_kcal_Production<-c(5,8)
Total_Energy_kcal_Import<-c(6,1)
Total_Ca_g_Production<-c(3,4)
Total_Ca_g_Import<-c(3,8)
df<-cbind(Year,Total_Energy_kcal_Production, Total_Energy_kcal_Import, Total_Ca_g_Production, Total_Ca_g_Import)
looks like:
Year Total_Energy_kcal_Production Total_Energy_kcal_Import Total_Ca_g_Production Total_Ca_g_Import
1961 5 6 3 3
1962 8 1 4 8
and I want it to look like this:
Year Nutrient Production Import
1961 Total_Energy_kcal 5 6
1962 Total_Energy_kcal 8 1
1961 Total_Ca_g 3 3
1962 Total_Ca_g 4 8
I tried a lot with pivot_longer and names_patern. I thought this would work, but I do not fully understand the arguments:
df_piv<-df%>%
pivot_longer(cols = -Year, names_to = "Nutrient",
names_pattern = ".*(?=_)")
I get an error-message that i can not interprete:
Error: Can't select within an unnamed vector.
You can provide names_pattern regex as :
tidyr::pivot_longer(df,
cols = -Year,
names_to = c('Nutrient', '.value'),
names_pattern = '(.*)_(\\w+)')
# Year Nutrient Production Import
# <dbl> <chr> <dbl> <dbl>
#1 1961 Total_Energy_kcal 5 6
#2 1961 Total_Ca_g 3 3
#3 1962 Total_Energy_kcal 8 1
#4 1962 Total_Ca_g 4 8
This will put everything until the last underscore in Nutrient column and the remaining data is kept as column name.
data
cbind will create a matrix, use data.frame to create data.
df<-data.frame(Year,Total_Energy_kcal_Production,Total_Energy_kcal_Import,
Total_Ca_g_Production, Total_Ca_g_Import)

Transpose column and group dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 5 years ago.
I'm trying to change a dataframe in R to group multiple rows by a measurement. The table has a location (km), a size (mm) a count of things in that size bin, a site and year. I want to take the sizes, make a column from each one (2, 4 and 6 in this example), and place the corresponding count into each the row for that location, site and year.
It seems like a combination of transposing and grouping, but I can't figure out a way to accomplish this in R. I've looked at t(), dcast() and aggregate(), but those aren't really close at all.
So I would go from something like this:
df <- data.frame(km=c(rep(32,3),rep(50,3)), mm=rep(c(2,4,6),2), count=sample(1:25,6), site=rep("A", 6), year=rep(2013, 6))
km mm count site year
1 32 2 18 A 2013
2 32 4 2 A 2013
3 32 6 12 A 2013
4 50 2 3 A 2013
5 50 4 17 A 2013
6 50 6 21 A 2013
To this:
km site year mm_2 mm_4 mm_6
1 32 A 2013 18 2 12
2 50 A 2013 3 17 21
Edit: I tried the solution in a suggested duplicate, but I did not work for me, not really sure why. The answer below worked better.
As suggested in the comment above, we can use the sep argument in spread:
library(tidyr)
spread(df, mm, count, sep = "_")
km site year mm_2 mm_4 mm_6
1 32 A 2013 4 20 1
2 50 A 2013 15 14 22
As you mentioned dcast(), here is a method using it.
set.seed(1)
df <- data.frame(km=c(rep(32,3),rep(50,3)),
mm=rep(c(2,4,6),2),
count=sample(1:25,6),
site=rep("A", 6),
year=rep(2013, 6))
library(reshape2)
dcast(df, ... ~ mm, value.var="count")
# km site year 2 4 6
# 1 32 A 2013 13 10 20
# 2 50 A 2013 3 17 1
And if you want a bit of a challenge you can try the base function reshape().
df2 <- reshape(df, v.names="count", idvar="km", timevar="mm", ids="mm", direction="wide")
colnames(df2) <- sub("count.", "mm_", colnames(df2))
df2
# km site year mm_2 mm_4 mm_6
# 1 32 A 2013 13 10 20
# 4 50 A 2013 3 17 1

Merge records in data frame in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Convert data from long format to wide format with multiple measure columns
(6 answers)
Closed 6 years ago.
I have the following example data set:
data.frame(SEX=c("M","F","M","F"),COMPLAINT=c("headache","headache", "dizziness", "dizziness"),
reports=c(5,4,9,12), users = c(1250,3460,2500,1850))
SEX COMPLAINT reports users
1 M headache 5 1250
2 F headache 4 3460
3 M dizziness 9 2500
4 F dizziness 12 1850
My question is how to merge rows 1 and 2 , and 3 and 4 so that my data frame is as follows:
COMPLAINT reports_male reports_female users_male users_female
1 headache 5 4 1250 3460
2 dizziness 9 12 2500 1850
Anyone got a quick solution that I can use for a (much) larger dataset?
We can use the dcast from data.table which can take multiple value.var columns and is quite efficient on big datasets
library(data.table)
dcast(setDT(df1), COMPLAINT ~ SEX, value.var = c("reports", "users"))
# COMPLAINT reports_F reports_M users_F users_M
#1: dizziness 12 9 1850 2500
#2: headache 4 5 3460 1250
As seen in How to reshape data from long to wide format?, we can use library(reshape2) and then
reshape(df, idvar = "COMPLAINT", timevar = "SEX", direction = "wide").
COMPLAINT reports.M users.M reports.F users.F
1 headache 5 1250 4 3460
3 dizziness 9 2500 12 1850

dcast - concatenate column values with column names [duplicate]

This question already has answers here:
how to spread or cast multiple values in r [duplicate]
(2 answers)
Closed 7 years ago.
I would like to concatenate column values with column names to create new columns. I am experimenting with library(reshape2), dcast however I can't get the required output.
Is there a method that doesn't involve performing dcast multiple times then merging the resulting sets back together?
Current data frame:
observation=c(1,1,1,2,2,2,3,3,3)
event=c('event1','event2','event3','event1','event2','event3','event1','event2','event3')
value1=c(1,2,3,4,5,6,7,8,9)
value2=c(11,12,13,14,15,16,17,18,19)
current=data.frame(observation,event,value1,value2)
current
Required data frame:
observation=c(1,2,3)
event1_value1 =c(1,4,7)
event2_value1 =c(2,5,8)
event3_value1 =c(3,6,9)
event1_value2 =c(11,14,17)
event2_value2 =c(12,15,18)
event3_value2 =c(13,16,19)
required=data.frame(observation,event1_value1,event2_value1,event3_value1,event1_value2,event2_value2,event3_value2)
required
The method below works but I feel there must be a quicker way!
library(reshape2)
value1 <- dcast(current,observation~event,value.var ="value1")
value2 <- dcast(current,observation~event,value.var ="value2")
merge(value1,value2,by="observation",suffixes = c("_value1","_value2"))
This is an extension of reshape from long to wide
You can use the devel version of data.table i.e. v1.9.5 which can take multiple value.var columns. Instructions to install the devel version are here
library(data.table)#v1.9.5+
dcast(setDT(current), observation~event, value.var=c('value1', 'value2'))
# observation event1_value1 event2_value1 event3_value1 event1_value2
#1: 1 1 2 3 11
#2: 2 4 5 6 14
#3: 3 7 8 9 17
# event2_value2 event3_value2
#1: 12 13
#2: 15 16
#3: 18 19
Or reshape from base R
reshape(current, idvar='observation', timevar='event', direction='wide')
# observation value1.event1 value2.event1 value1.event2 value2.event2
#1 1 1 11 2 12
#4 2 4 14 5 15
#7 3 7 17 8 18
# value1.event3 value2.event3
#1 3 13
#4 6 16
#7 9 19
I'm not sure of the efficiency but you could try this -
> dcast(melt(current,id.vars = c('observation','event')),observation~event+variable)
observation event1_value1 event1_value2 event2_value1 event2_value2 event3_value1 event3_value2
1 1 1 11 2 12 3 13
2 2 4 14 5 15 6 16
3 3 7 17 8 18 9 19

Pivot table from column to row in excel or R [duplicate]

This question already has answers here:
Reshape data from wide to long? [duplicate]
(3 answers)
Closed 9 years ago.
I have a table with header like this
Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
I want to pivot this table as
Id time x y
What is the best way to do this in Excel or R?
Something like this using base R reshape:
Get some data first
test <- read.table(text="Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
a 1 2 3 4 5 6 7 8 9 10
b 10 20 30 40 50 60 70 80 90 100",header=TRUE)
Then reshape:
reshape(
test,
idvar="Id",
varying=list(2:6,7:11),
direction="long",
v.names=c("x","y"),
times=seq(1960,2000,10)
)
Or let reshape guess the names automatically based on the . separator:
reshape(
test,
idvar="Id",
varying=-1,
direction="long",
sep="."
)
Resulting in:
Id time x y
a.1960 a 1960 1 6
b.1960 b 1960 10 60
a.1970 a 1970 2 7
b.1970 b 1970 20 70
a.1980 a 1980 3 8
b.1980 b 1980 30 80
a.1990 a 1990 4 9
b.1990 b 1990 40 90
a.2000 a 2000 5 10
b.2000 b 2000 50 100

Resources