Functions R Programming - r

I have information about revenue in cumulative form for the whole year, I would like to get monthly revenue. Lets say first month revenue is 3.2M, and second month revenue is 2.2M, but my second entry is sum of first two months.
Revenue
3.2
5.4
7.6
9.2
I would like to extract revenue as below
ExRevenue
3.2
2.2
2.2
1.6
How can I extract the revenue using R functions. Please help.

You could do
df <- read.table(header=T,text="Revenue
3.2
5.4
7.6
9.2")
df$ExRevenue <- c(df$Revenue[1], diff(df$Revenue))
df
# Revenue ExRevenue
# 1 3.2 3.2
# 2 5.4 2.2
# 3 7.6 2.2
# 4 9.2 1.6

Related

Reshaping a dataset [duplicate]

This question already has answers here:
R spread dataframe [duplicate]
(1 answer)
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
I'm very new to R and I'm not very good at it still.
I am working with a dataframe that looks like this:
ID ESG var Y2009 Y2010 Y2011
A ESG score 5.1 3.5 4.8
A Emissions 3.0 1.4 1.3
B ESG score 6.5 4.6 2.1
B Emissions 3.6 1.9 1.6
but I would like to reshape it and make it look like:
ID YEARS ESG score Emissions
A 2009 5.1 3.0
A 2010 3.5 1.4
A 2011 4.8 1.3
B 2009 6.5 3.6
B 2010 4.6 1.9
B 2011 2.1 1.6
I need one year variable that takes three values (2009, 2010, 2011) and to go to two ESG variables ( ESG score and Emission) that take the corresponding numerical value.
I tried to use the function reshape() and melt() but I couldn't find any good way.
Can someone help me please?
library(reshape)
out <- cast(melt(df, id=c("ID","ESG.var")),ID+variable~ESG.var,value="value")
out[,2] <- as.numeric(gsub("Y","",out[,2]))
colnames(out)[2] <-"YEARS"
out
gives,
ID YEARS Emissions ESG score
1 A 2009 3.0 5.1
2 A 2010 1.4 3.5
3 A 2011 1.3 4.8
4 B 2009 3.6 6.5
5 B 2010 1.9 4.6
6 B 2011 1.6 2.1
Data:
df <- read.table(text="ID 'ESG var' Y2009 Y2010 Y2011
A 'ESG score' 5.1 3.5 4.8
A 'Emissions' 3.0 1.4 1.3
B 'ESG score' 6.5 4.6 2.1
B 'Emissions' 3.6 1.9 1.6",header=T, stringsAsFactors=FALSE)

How can I plot a whole scaled data set with one line per row?

I have a data set with 6 alternatives. I used the scale function to standardize my data set. Now I want to plot a curve for each row.
I want to have my six alternatives as values in the X-Axis, and their standardize value in the Y-Axis.
A sample of my data set is like this:
fish rice meet milk
1 2.3 3.4 1.4 1.3
1 2.6 3.5 2.4 2.4
1 4.3 1.9 3.3 3.1
1 1.3 2.4 4.4 9.3
2 1.3 3.4 4.1 3.4
2 3.3 2.9 3.3 2.1
2 4.5 3.9 3.3 3.1
2 1.4 2.4 4.4 9.3
where first column is individual, in this sample we had 2 person
Now I want to draw a curve for each row so in the x-axis I have (fish, rice meet milk) and in y-axis I have these numbers.
For example, the first curve is formed by connecting points 2.3, 3.4, 1.4, 1.3 in y-axis
Since you want each row to be a separate group, you should make row number a variable to preserve that information, then reshape the data from wide to long so you can plot it properly in ggplot2:
library(tidyverse)
df1 = df %>%
rowid_to_column('row') %>%
gather(key, value, -row)
head(df1)
row key value
1 1 fish 2.3
2 2 fish 2.6
3 3 fish 4.3
4 4 fish 1.3
5 1 rice 3.4
6 2 rice 3.5
# group is needed to tell ggplot which points to connect in each line
ggplot(df1, aes(x = key, y = value, color = factor(row), group = row)) +
geom_line()

How to match rows from one df with columns in another df in R

Basically I've got 2 tables or Data frames (I think that's the term..?), One of them has the identifier in the Row, the other has it in the Column. Like below
df 1
Id Location
34 Hunter Region
35 Hunter Region
36 Western Region
37 Western Region
38 Western Region
...
df 2
Date 34 35 36 37 38
15/01/18 1.5 2.4 1.4 1.6 2.2
16/01/18 1.5 2.4 1.4 1.6 2.2
17/01/18 1.5 2.4 1.4 1.6 2.2
...
What I want to do is separate df2 into new tables based on the Region (e.g. one for Hunter Region, and one for Western Region)
To separate dataframe df2 into Hunter and Western Region columns you could do:
create two selectors:
sel_hunter = as.character(df1$Id[df1$Location=="Hunter Region"])
sel_western = as.character(df1$Id[df1$Location=="Western Region"])
add the "Date" column to these selectors:
sel_hunter = c("Date", sel_hunter)
sel_western = c("Date", sel_western)
and then proceed to separate df2 into two dataframes:
df2_hunter = df2[ , sel_hunter]
Date 34 35
1 15/01/18 1.5 2.4
2 16/01/18 1.5 2.4
3 17/01/18 1.5 2.4
df2_western = df2[ , sel_western]
Date 36 37 38
1 15/01/18 1.4 1.6 2.2
2 16/01/18 1.4 1.6 2.2
3 17/01/18 1.4 1.6 2.2

Cumulative summing between groups using dplyr

I have a tibble structured as follows:
day theta
1 1 2.1
2 1 2.1
3 2 3.2
4 2 3.2
5 5 9.5
6 5 9.5
7 5 9.5
Note that the tibble contains multiple rows for each day, and for each day the same value for theta is repeated an arbitrary number of times. (The tibble contains other arbitrary columns necessitating this repeating structure.)
I'd like to use dplyr to cumulatively sum values for theta across days such that, in the example above, 2.1 is added only a single time to 3.2, etc. The tibble would be mutated so as to append the new cumulative sum (c.theta) as follows:
day theta c.theta
1 1 2.1 2.1
2 1 2.1 2.1
3 2 3.2 5.3
4 2 3.2 5.3
5 5 9.5 14.8
6 5 9.5 14.8
7 5 9.5 14.8
...
My initial efforts to group_by day and then cumsum over theta resulted only in cumulative summing over the full set of data (e.g., 2.1 + 2.1 + 3.2 ...) which is undesirable. In my Stack Overflow searches, I can find many examples of cumulative summing within groups, but never between groups, as I describe above. Nudges in the right direction would be much appreciated.
Doing this in dplyr I came up with a very similar solution to PoGibas - use distinct to just get one row per day, find the sum and merge back in:
df = read.table(text="day theta
1 1 2.1
2 1 2.1
3 2 3.2
4 2 3.2
5 5 9.5
6 5 9.5
7 5 9.5", header = TRUE)
cumsums = df %>%
distinct(day, theta) %>%
mutate(ctheta = cumsum(theta))
df %>%
left_join(cumsums %>% select(day, ctheta), by = 'day')
Not a dplyr, but just an alternative data.table solution:
library(data.table)
# Original table is called d
setDT(d)
merge(d, unique(d)[, .(c.theta = cumsum(theta), day)])
day theta c.theta
1: 1 2.1 2.1
2: 1 2.1 2.1
3: 2 3.2 5.3
4: 2 3.2 5.3
5: 5 9.5 14.8
6: 5 9.5 14.8
7: 5 9.5 14.8
PS: If you want to preserve other columns you have to use unique(d[, .(day, theta)])
In base R you could use split<- and tapply to return the desired result.
# construct 0 vector to fill in
dat$temp <- 0
# fill in with cumulative sum for each day
split(dat$temp, dat$day) <- cumsum(tapply(dat$theta, dat$day, head, 1))
Here, tapply returns the first element of theta for each day which is is fed to cumsum. The elements of cumulative sum are fed to each day using split<-.
This returns
dat
day theta temp
1 1 2.1 2.1
2 1 2.1 2.1
3 2 3.2 5.3
4 2 3.2 5.3
5 5 9.5 14.8
6 5 9.5 14.8
7 5 9.5 14.8

select column in R with condition

I have a data frame as follows
V2 V4 V6 V8
1 5 5.2 5.1 4.8
2 4.4 4.1 4.5 4.3
3 4.2 3.8 4.2 4.1
4 5 3.2 3.3 4.0
In actual data V value goes from V2 to V200 and row goes from 1 to 99. I want to select columns if its values ever goes less than 4.
Result should be,
V4 V6
1 5.2 5.1
2 4.1 4.5
3 3.8 4.2
4 3.2 3.3
Also want to select columns whose value never goes less than 4. Result should be
V2 V8
1 5 4.8
2 4.4 4.3
3 4.2 4.1
4 5 4.0
I am trying with subset command, but not able to get it done yet.
You have not specified whether you want to do this for each row or for the whole data.frame. For a full data.frame:
mins <- sapply(df, min)
moreThan4 <- df[which(mins > 4)]
lessThan4 <- df[which(mins < 4)]

Resources