Stacking two data frame columns into a single separate data frame column in R - r

I will present my question in two ways. First, requesting a solution for a task; and second, as a description of my overall objective (in case I am overthinking this and there is an easier solution).
1) Task Solution
Data context: each row contains four price variables (columns) representing (a) the price at which the respondent feels the product is too cheap; (b) the price that is perceived as a bargain; (c) the price that is perceived as expensive; (d) the price that is too expensive to purchase.
## mock data set
a<-c(1,5,3,4,5)
b<-c(6,6,5,6,8)
c<-c(7,8,8,10,9)
d<-c(8,10,9,11,12)
df<-as.data.frame(cbind(a,b,c,d))
## result
# a b c d
#1 1 6 7 8
#2 5 6 8 10
#3 3 5 8 9
#4 4 6 10 11
#5 5 8 9 12
Task Objective: The goal is to create a single column in a new data frame that lists all of the unique values contained in a, b, c, and d.
price
#1 1
#2 3
#3 4
#4 5
#5 6
...
#12 12
My initial thought was to use rbind() and unique()...
price<-rbind(df$a,df$b,df$c,df$d)
price<-unique(price)
...expecting that a, b, c and d would stack vertically.
[Pseudo illustration]
a[1]
a[2]
a[...]
a[n]
b[1]
b[2]
b[...]
b[n]
etc.
Instead, the "columns" are treated as rows and stacked horizontally.
V1 V2 V3 V4 V5
1 1 5 3 4 5
2 6 6 5 6 8
3 7 8 8 10 9
4 8 10 9 11 12
How may I stack a, b, c and d such that price consists of only one column ("V1") that contains all twenty responses? (The unique part I can handle separately afterwards).
2) Overall Objective: The Bigger Picture
Ultimately, I want to create a cumulative share of population for each price (too cheap, bargain, expensive, too expensive) at each price point (defined by the unique values described above). For example, what percentage of respondents felt $1 was too cheap, what percentage felt $3 or less was too cheap, etc.
The cumulative shares for bargain and expensive are later inverted to become not.bargain and not.expensive and the four vectors reside in a data frame like this:
buckets too.cheap not.bargain not.expensive too.expensive
1 0.01 to 0.50 0.000000000 1 1 0
2 0.51 to 1.00 0.000000000 1 1 0
3 1.01 to 1.50 0.000000000 1 1 0
4 1.51 to 2.00 0.000000000 1 1 0
5 2.01 to 2.50 0.001041667 1 1 0
6 2.51 to 3.00 0.001041667 1 1 0
...
from which I may plot something that looks like this:
Above, I accomplished my plotting objective using defined price buckets ($0.50 ranges) and the hist() function.
However, the intersections of these lines have meanings and I want to calculate the exact price at which any of the lines cross. This is difficult when the x-axis is defined by price range buckets instead of a specific value; hence the desire to switch to exact values and the need to generate the unique price variable.
[Postscript: This analysis is based on Peter Van Westendorp's Price Sensitivity Meter (https://en.wikipedia.org/wiki/Van_Westendorp%27s_Price_Sensitivity_Meter) which has known practical limitations but is relevant in the context of my research which will explore consumer perceptions of value under different treatments rather than defining an actual real-world price. I mention this for two reasons 1) to provide greater insight into my objective in case another approach comes to mind, and 2) to keep the thread focused on the mechanics rather than whether or not the Price Sensitivity Meter should be used.]

We can unlist the data.frame to a vector and get the sorted unique elements
sort(unique(unlist(df)))
When we do an rbind, it creates a matrix and unique of matrix calls the unique.matrix
methods('unique')
#[1] unique.array unique.bibentry* unique.data.frame unique.data.table* unique.default unique.IDate* unique.ITime*
#[8] unique.matrix unique.numeric_version unique.POSIXlt unique.warnings
which loops through the rows as the default MARGIN is 1 and then looks for unique elements. Instead, if we use the 'price', either as.vector or c(price) converts into vector
sort(unique(c(price)))
#[1] 1 3 4 5 6 7 8 9 10 11 12
If we use unique.default
sort(unique.default(price))
#[1] 1 3 4 5 6 7 8 9 10 11 12

Related

Count consecutive preceding elements in DolphinDB

Volume
f
Explanation
10
0
no volume before 10
7
0
no smaller volume before 7
13
2
Both 10 and 7 are smaller than 13
6
0
13 is larger than 6
4
0
6 is larger than 4
8
2
Both 6 and 4 are smaller than 8
7
0
8 is larger than 7
3
0
7 is larger than 3
4
1
3 is smaller than 4
As shown in the above table, I’d like to obtain the f column based on volume in DolphinDB.
Suppose the current volume is t, the desired output f is the count of volumes that meet the following conditions:
There are consecutive elements in volume column that are less than t
The last volume of the consecutive elements is the preceding volume
before t;
The calculation principle in detail is illustrated in the explanation column.
I tried for-loop but it didn't work. Does DolphinDB support any other functions to obtain the result?
t = table(1..10 as volume) tmp = select volume, iif(deltas(volume)>0, rowNo(volume), NULL) as flag from t tmp.bfill!() select volume, cumrank(volume) from tmp context by flag

Nested logit model using panel data in R

I am new to R and I would love it if you can help me with this because I am having serious difficulties.
I have unbalanced panel data that shows monthly companies' performance compared to the rest of the market in terms of $$ (eg. this month company 1 has made $1000 more than the average of the market). Each of these companies had decided on a strategy when they entered the market (1 through 8). These strategies are nested into two different groups (a,b) so that strategies 1,2, and 3 are part of the group a, while strategies 4 through 8 are part of group b. I would need a rank of the best strategies from best to worst.
I have discretized my DV so that now it only shows whether that month company 1 performed higher or lower than the market. However, I am not sure it is the right way because I would then lose how much better or worse each month companies performed compared to the market.
My data looks like this:
ID Main Strategy YearMonth DiffPerformance Control1 Control 2 DiffPerformanceHL
1 a 2 201706 9.037 2 57 H
1 a 2 201707 4.371 2 57 H
1 a 2 201708 1.633 2 57 H
1 a 2 201709 -3.521 2 59 L
1 a 2 201710 13.096 2 59 H
1 a 2 201711 5.070 2 60 H
1 a 2 201712 4.25 2 60 H
2 b 5 201904 6.78 4 171 H
2 b 5 201905 -15.26 4 169 L
2 b 5 201906 7.985 4 169 H
Where ID is the company, Main is the group (a or b) Strategies are 1 through 8 and nested as previously stated, YearMonth represents the specific month, DifferencePerformance is the DV as a continuous variable, Control 1 is static over time and is a categorical variable (1 through 6), Control 2 is a control count variable that changes over time, and DiffPerformance HL is the discretized DV.
Can you please help me figuring out how to create a nested logit model in R? I would be super appreciative
Thanks

Sum variables conditionally with loop in r

I realize this is a topic that's covered somewhat well but I couldn't find anything that approaches this specific concern:
I have a df with 800 columns, 10 iterations of 80 columns (each column represents an item) - Each column is named something like: 1_BL_PRE.1 1_FU_PRE.1 1_BL_PRE.1 1_BL_POST.1
Where the first '1' indicates the item number and the second '1' indicates the iteration number.
What I'm trying to figure out is how to get the sums of specific groups of items from all 10 iterations.
As a short example let's say I want to take the 1st and 3rd item of BL_PRE and get the sum of all 10 iterations for those 2 items - how would I do this?
subject 1_BL_PRE.1 2_BL_PRE.1 3_BL_PRE.1 1_BL_PRE.2 2_BL_PRE.2
1 40002 3 4 3 1 2
2 40004 1 2 3 4 4
3 40006 4 3 3 3 1
4 40008 2 3 1 2 3
5 40009 3 4 1 2 3
Expected output (where A represents the sum of 1_BL_PRE.1, 3_BL_PRE.1, 1_BL_PRE.2 and so on):
subject BL_PRE_A
1 40002 12
2 40004 14
3 40006 15
4 40008 20
5 40009 12
My hunch is the solution is related to a for-loop or lappy (and I'm not familiar at all with either). I'm trying to work with apply(finaldata,1,function(x) {sum(x ...)}) but I haven't been able to figure out the conditional statement for the function of sum.
If there's an implementation with plyr I'd be really curious to see what that looks like. (and if there's a thread that answers this, apologies and just re-direct!)
**Edited to include small example + code I'm trying to get to work
Thanks!

Efficient method of obtaining successive high values of data.frame column

Lets say I have the following data.frame in R
df <- data.frame(order=(1:10),value=c(1,7,3,5,9,2,9,10,2,3))
Other than looping through data an testing whether value exceeds previous high value how can I get successive high values so that I can end up with a table like this
order value
1 1
2 7
5 9
8 10
TIA
Here's one option, if I understood the question correct:
df[df$value > cummax(c(-Inf, head(df$value, -1))),]
# order value
#1 1 1
#2 2 7
#5 5 9
#8 8 10
I use cummax to keep track of the maximum of column "value" and compare it (the previous row's cummax) to each "value" entry. To make sure the first entry is also selected, I start by "-Inf".
"get successive high values (of value?)" is unclear.
It seems you want to filter only rows whose value is higher than previous max.
First, we reorder your df in increasing order of value... (not clear but I think that's what you wanted)
Then we use logical indexing with diff()>0 to only include strictly-increasing rows:
rdf <- df[order(df$value),]
rdf[ diff(rdf$value)>0, ]
order value
1 1 1
9 9 2
10 10 3
4 4 5
2 2 7
7 7 9
8 8 10

In R, how can I find unique pairs of numbers across rows when there are three columns?

I am working with triangular meshes in R. For those not familiar, the PLY format has two main components, a 3 by n matrix of vertex x,y,z coordinates, where n is the number of vertices, and a 3 by m matrix of faces where each number references one line from the vertex matrix, and so defining three corners of a triangular face. I am trying to find the mesh boundary edges, which are the "sides" of the triangles that are only referenced once in the faces matrix.
Therefore my question is, how do I find unique pairs of numbers across rows where there are three columns?
face 1 4 6 7
face 2 7 6 8
face 3 9 11 12
face 4 10 9 12
Here line (face) 1 has the edge 4-7 that only appears once, while 6-7 appears twice, as does 9-12.
unique() works across rows, but looks for unique rows, and expects the numbers to be in the same order. Any suggestions?
What you want to do is hash each pair, then make a table of the hashes. You also want (x,y)
to hash the same as (y,x).
R>data
V1 V2 V3 V4 V5
1 face 1 4 6 7
2 face 2 7 6 8
3 face 3 9 11 12
4 face 4 10 9 12
R>e1 <- pmin(data[3], data[4]) + pmax(data[3], data[4])/100
R>e2 <- pmin(data[3], data[5]) + pmax(data[3], data[5])/100
R>e3 <- pmin(data[4], data[5]) + pmax(data[4], data[5])/100
R>table(c(e1,e2,e3, recursive=TRUE))
4.06 4.07 6.07 6.08 7.08 9.1 9.11 9.12 10.12 11.12
1 1 2 1 1 1 1 2 1 1

Resources