Reshaping a dataset [duplicate] - r

This question already has answers here:
R spread dataframe [duplicate]
(1 answer)
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
I'm very new to R and I'm not very good at it still.
I am working with a dataframe that looks like this:
ID ESG var Y2009 Y2010 Y2011
A ESG score 5.1 3.5 4.8
A Emissions 3.0 1.4 1.3
B ESG score 6.5 4.6 2.1
B Emissions 3.6 1.9 1.6
but I would like to reshape it and make it look like:
ID YEARS ESG score Emissions
A 2009 5.1 3.0
A 2010 3.5 1.4
A 2011 4.8 1.3
B 2009 6.5 3.6
B 2010 4.6 1.9
B 2011 2.1 1.6
I need one year variable that takes three values (2009, 2010, 2011) and to go to two ESG variables ( ESG score and Emission) that take the corresponding numerical value.
I tried to use the function reshape() and melt() but I couldn't find any good way.
Can someone help me please?

library(reshape)
out <- cast(melt(df, id=c("ID","ESG.var")),ID+variable~ESG.var,value="value")
out[,2] <- as.numeric(gsub("Y","",out[,2]))
colnames(out)[2] <-"YEARS"
out
gives,
ID YEARS Emissions ESG score
1 A 2009 3.0 5.1
2 A 2010 1.4 3.5
3 A 2011 1.3 4.8
4 B 2009 3.6 6.5
5 B 2010 1.9 4.6
6 B 2011 1.6 2.1
Data:
df <- read.table(text="ID 'ESG var' Y2009 Y2010 Y2011
A 'ESG score' 5.1 3.5 4.8
A 'Emissions' 3.0 1.4 1.3
B 'ESG score' 6.5 4.6 2.1
B 'Emissions' 3.6 1.9 1.6",header=T, stringsAsFactors=FALSE)

Related

How can I plot a whole scaled data set with one line per row?

I have a data set with 6 alternatives. I used the scale function to standardize my data set. Now I want to plot a curve for each row.
I want to have my six alternatives as values in the X-Axis, and their standardize value in the Y-Axis.
A sample of my data set is like this:
fish rice meet milk
1 2.3 3.4 1.4 1.3
1 2.6 3.5 2.4 2.4
1 4.3 1.9 3.3 3.1
1 1.3 2.4 4.4 9.3
2 1.3 3.4 4.1 3.4
2 3.3 2.9 3.3 2.1
2 4.5 3.9 3.3 3.1
2 1.4 2.4 4.4 9.3
where first column is individual, in this sample we had 2 person
Now I want to draw a curve for each row so in the x-axis I have (fish, rice meet milk) and in y-axis I have these numbers.
For example, the first curve is formed by connecting points 2.3, 3.4, 1.4, 1.3 in y-axis
Since you want each row to be a separate group, you should make row number a variable to preserve that information, then reshape the data from wide to long so you can plot it properly in ggplot2:
library(tidyverse)
df1 = df %>%
rowid_to_column('row') %>%
gather(key, value, -row)
head(df1)
row key value
1 1 fish 2.3
2 2 fish 2.6
3 3 fish 4.3
4 4 fish 1.3
5 1 rice 3.4
6 2 rice 3.5
# group is needed to tell ggplot which points to connect in each line
ggplot(df1, aes(x = key, y = value, color = factor(row), group = row)) +
geom_line()

How to match rows from one df with columns in another df in R

Basically I've got 2 tables or Data frames (I think that's the term..?), One of them has the identifier in the Row, the other has it in the Column. Like below
df 1
Id Location
34 Hunter Region
35 Hunter Region
36 Western Region
37 Western Region
38 Western Region
...
df 2
Date 34 35 36 37 38
15/01/18 1.5 2.4 1.4 1.6 2.2
16/01/18 1.5 2.4 1.4 1.6 2.2
17/01/18 1.5 2.4 1.4 1.6 2.2
...
What I want to do is separate df2 into new tables based on the Region (e.g. one for Hunter Region, and one for Western Region)
To separate dataframe df2 into Hunter and Western Region columns you could do:
create two selectors:
sel_hunter = as.character(df1$Id[df1$Location=="Hunter Region"])
sel_western = as.character(df1$Id[df1$Location=="Western Region"])
add the "Date" column to these selectors:
sel_hunter = c("Date", sel_hunter)
sel_western = c("Date", sel_western)
and then proceed to separate df2 into two dataframes:
df2_hunter = df2[ , sel_hunter]
Date 34 35
1 15/01/18 1.5 2.4
2 16/01/18 1.5 2.4
3 17/01/18 1.5 2.4
df2_western = df2[ , sel_western]
Date 36 37 38
1 15/01/18 1.4 1.6 2.2
2 16/01/18 1.4 1.6 2.2
3 17/01/18 1.4 1.6 2.2

Functions R Programming

I have information about revenue in cumulative form for the whole year, I would like to get monthly revenue. Lets say first month revenue is 3.2M, and second month revenue is 2.2M, but my second entry is sum of first two months.
Revenue
3.2
5.4
7.6
9.2
I would like to extract revenue as below
ExRevenue
3.2
2.2
2.2
1.6
How can I extract the revenue using R functions. Please help.
You could do
df <- read.table(header=T,text="Revenue
3.2
5.4
7.6
9.2")
df$ExRevenue <- c(df$Revenue[1], diff(df$Revenue))
df
# Revenue ExRevenue
# 1 3.2 3.2
# 2 5.4 2.2
# 3 7.6 2.2
# 4 9.2 1.6

Extracting complete paired values (non-NA) from a matrix in R [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 7 years ago.
I apologize if this is elementary or has been answered before, but I haven't found an answer to my question despite extensive searching. I'm also very new to programming so please bear with me here.
I have a bunch of 25 by 2 matrices of data, however some of the cells have NA values. I'm looking to extract a subset of the matrix consisting of only the complete paired values (so no NA values).
So say I have:
3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5
I want:
3.6 4.2
9.2 8.4
1.1 8.2
2.7 3.5
Is there some function that would do this easily?
Thanks!
Try this
df <- read.table(text = "3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5")
df[complete.cases(df), ]
# V1 V2
# 1 3.6 4.2
# 2 9.2 8.4
# 4 1.1 8.2
# 7 2.7 3.5
df[ apply(!is.na(df), 1, all) , ]
df <- data.frame(V1 = c(3.6,9.2,4.8,1.1,NA,NA,2.7),
V2 = c(4.2,8.4,NA,8.2,11.6,NA,3.5))
EDIT: I forgot na.omit or complete.cases Doh.

select column in R with condition

I have a data frame as follows
V2 V4 V6 V8
1 5 5.2 5.1 4.8
2 4.4 4.1 4.5 4.3
3 4.2 3.8 4.2 4.1
4 5 3.2 3.3 4.0
In actual data V value goes from V2 to V200 and row goes from 1 to 99. I want to select columns if its values ever goes less than 4.
Result should be,
V4 V6
1 5.2 5.1
2 4.1 4.5
3 3.8 4.2
4 3.2 3.3
Also want to select columns whose value never goes less than 4. Result should be
V2 V8
1 5 4.8
2 4.4 4.3
3 4.2 4.1
4 5 4.0
I am trying with subset command, but not able to get it done yet.
You have not specified whether you want to do this for each row or for the whole data.frame. For a full data.frame:
mins <- sapply(df, min)
moreThan4 <- df[which(mins > 4)]
lessThan4 <- df[which(mins < 4)]

Resources