R Dummies for subsetting data - r

I have data on the GPS bearing(0-360 degrees) from one place to many other places(A-Z)
I want to create 4 columns of dummy variables, specifically: 0-89 degrees, 90-179 degrees, 180-269 degrees, and 270-360 degrees. So that each observation (A-Z) will have a 0 in three of the columns, and a 1 in the column that corresponds to its bearing.
Thanks all!

You can use model.matrix in combination with cut. cut creates a factor with the grouping and model.matrix generates the dummy data frame.
x <- c(0, 67, 90, 183, 352)
res <- model.matrix(~ cut(x, c(-1, 89, 179, 269, 360))-1)
the output is then
cut(GPS$gps_bearing, c(-1, 89, 179, 269, 360))(-1,89]
1 1
2 1
3 0
4 0
5 0
cut(GPS$gps_bearing, c(-1, 89, 179, 269, 360))(89,179]
1 0
2 0
3 1
4 0
5 0
cut(GPS$gps_bearing, c(-1, 89, 179, 269, 360))(179,269]
1 0
2 0
3 0
4 1
5 0
cut(GPS$gps_bearing, c(-1, 89, 179, 269, 360))(269,360]
1 0
2 0
3 0
4 0
5 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`cut(GPS$gps_bearing, c(-1, 89, 179, 269, 360))`
[1] "contr.treatment"
Now the columns names aren't pretty and should probably be changed, but the matrix is as you'd want.

Step1: Binning the temperature using cut
Step2: Creatinng dummies by createDummyFeatures (present in mlr library)
install.packages("mlr")
library(mlr)
a <- data.frame(cbind(state=c("a","b","c","d","e","f","g"),
temperature=c(0,12,89,90,180,350,360)))
a$temperature <- as.numeric(levels(a$temperature))[a$temperature]
a$bucket <- cut(a$temperature,c(0,89,179,269,360),
labels=c("0-89","90-179","180-269","270-360"),include.lowest=TRUE)
createDummyFeatures(a,cols="bucket")
My Output:
|sno |state |temperature |bucket.0.89 |bucket.90.179 |bucket.180.269 |bucket.270.360
|1 |a |0 |1 |0 |0 |0
|2 |b |12 |1 |0 |0 |0
|3 |c |89 |1 |0 |0 |0
|4 |d |90 |0 |1 |0 |0
|5 |e |180 |0 |0 |1 |0
|6 |f |350 |0 |0 |0 |1
|7 |g |360 |0 |0 |0 |1
Let me know in case of any queries.

Related

Is it possible to output table ordered by group and limited per group?

I have a database with a table of cars, the table has a number of different columns. I need to output the content within that table ordered by the Make of each car, only three cars from each make need to be outputted along side the total from eachh row of car. I also need to have the output ordered in descending order accompanied by a column called Ranking that counts up from 1 to however many outputs there will be.
Below is a sample from my databse table
|Timestamp |Email |Name |Year|Make |Model |Car_ID|Judge_ID|Judge_Name|Racer_Turbo|Racer_Supercharged|Racer_Performance|Racer_Horsepower|Car_Overall|Engine_Modifications|Engine_Performance|Engine_Chrome|Engine_Detailing|Engine_Cleanliness|Body_Frame_Undercarriage|Body_Frame_Suspension|Body_Frame_Chrome|Body_Frame_Detailing|Body_Frame_Cleanliness|Mods_Paint|Mods_Body|Mods_Wrap|Mods_Rims|Mods_Interior|Mods_Other|Mods_ICE|Mods_Aftermarket|Mods_WIP|Mods_Overall|
|--------------|-------------------------|----------|----|--------|---------|------|--------|----------|-----------|------------------|-----------------|----------------|-----------|--------------------|------------------|-------------|----------------|------------------|------------------------|---------------------|-----------------|--------------------|----------------------|----------|---------|---------|---------|-------------|----------|--------|----------------|--------|------------|
|8/5/2018 14:10|honoland13#japanpost.jp |Hernando |2015|Acura |TLX |48 |J04 |Bob |0 |0 |2 |2 |4 |4 |0 |2 |4 |4 |2 |4 |2 |2 |2 |2 |2 |0 |4 |4 |4 |6 |2 |0 |4 |
|8/5/2018 15:11|nlighterness2q#umn.edu |Noel |2015|Jeep |Wrangler |124 |J02 |Carl |0 |6 |4 |2 |4 |6 |6 |4 |4 |4 |6 |6 |6 |6 |6 |4 |6 |6 |6 |6 |6 |4 |6 |4 |6 |
|8/5/2018 17:10|eguest47#microsoft.com |Edan |2015|Lexus |Is250 |222 |J05 |Adrian |0 |0 |0 |0 |0 |0 |0 |0 |6 |6 |6 |0 |0 |6 |6 |6 |0 |0 |0 |0 |0 |0 |0 |0 |4 |
|8/5/2018 17:34|hchilley40#fema.gov |Hieronymus|1993|Honda |Civic eG |207 |J06 |Aaron |0 |0 |2 |2 |2 |2 |2 |2 |0 |4 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |0 |0 |2 |2 |0 |
|8/5/2018 14:30|nnowick3d#tuttocitta.it |Nickolas |2016|Ford |Mystang |167 |J02 |Carl |0 |0 |2 |2 |0 |2 |2 |0 |0 |0 |0 |2 |0 |2 |2 |2 |0 |0 |2 |0 |0 |0 |0 |0 |2 |
|8/5/2018 16:12|mdearl39#amazon.co.uk |Martin |2013|Hyundai |Gen coupe|159 |J04 |Bob |0 |0 |2 |0 |0 |0 |2 |0 |0 |0 |0 |2 |0 |2 |2 |0 |2 |0 |2 |0 |0 |0 |0 |0 |0 |
|8/5/2018 17:00|alynamg#blogtalkradio.com|Aldridge |2009|Infiniti|G37 |20 |J06 |Aaron |2 |0 |2 |2 |0 |0 |2 |0 |0 |2 |2 |2 |2 |2 |2 |2 |2 |2 |4 |2 |2 |0 |2 |0 |2 |
|8/5/2018 16:11|abowton3k#spiegel.de |Ambros |2009|Honda |Oddesy |178 |J06 |Aaron |2 |0 |2 |2 |2 |2 |2 |0 |4 |4 |2 |2 |2 |4 |4 |4 |2 |2 | |6 |4 |4 |6 |4 |6 |
|8/5/2018 17:29|qesterbrookn#bandcamp.com|Quincy |2012|Hyundai |Celoster |30 |J04 |Bob |0 |0 |2 |2 |2 |2 |2 |4 |6 |6 |4 |2 |4 |4 |6 |6 |4 |0 |2 |0 |0 |0 |2 |2 |4 |
The expected output is something like this below
|Ranking |Car_ID|Year |Make |Model |Total|
|--------|------|-------|------|-----------|-----|
|1 |48 |2015 |Acura |TLX |89 |
|2 |66 |2012 |Acura |MDX |75 |
|3 |101 |2022 |Acura |TLX |70 |
|4 |22 |2011 |Chevy |Camaro |112 |
|5 |40 |2015 |Chevy |Corvette |99 |
|6 |205 |2022 |Chevy |Corvette |66 |
|7 |111 |2006 |Ford |Mustang |94 |
|8 |97 |2003 |Ford |GT |88 |
|9 |71 |2008 |Ford |Fiesta ST |80 |
Here's the command I've been been able to put together which does something similar to what I need, but I can't figure out how to do the ranking column and order by descending from the total.
SELECT Car_ID, Year, Make, Model, Racer_Turbo + Racer_Supercharged + ... + Mods_Overall FROM Carstable order by Make limit 3;
This query command only returned three results instead of all, I also can't figure out where to put the DESC keyword in the command in order to have them listed in descending order based on the total column or how to do the ranking column as well. Any ideas?
Use a CTE which returns the column Total for each row and ROW_NUMBER() window function to pick the first 3 rows for each Make and to create the column Ranking:
WITH cte AS (
SELECT *,
Racer_Turbo + Racer_Supercharged + Racer_Performance + Racer_Horsepower +
Car_Overall +
Engine_Modifications + Engine_Performance + Engine_Chrome + Engine_Detailing + Engine_Cleanliness +
Body_Frame_Undercarriage + Body_Frame_Suspension + Body_Frame_Chrome + Body_Frame_Detailing + Body_Frame_Cleanliness +
Mods_Paint + Mods_Body + Mods_Wrap + Mods_Rims + Mods_Interior + Mods_Other + Mods_ICE + Mods_Aftermarket + Mods_WIP + Mods_Overall Total
FROM carstable
)
SELECT ROW_NUMBER() OVER (ORDER BY Make, Total DESC) Ranking,
Car_ID, Year, Make, Model, Total
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Make ORDER BY Total) rn FROM cte)
WHERE rn <= 3
ORDER BY Make, Total DESC;
See the demo.

Searching mathematical algorithm for vector calculation

I have given directions of 3D Objects like this:
Direction1:
X-Vector:
X_X: 1
X_Y: 0
X_Z: 0
Y-Vector:
Y_X: 0
Y_Y: 1
Y_Z: 0
Z-Vector:
Z_X: 0
Z_Y: 0
Z_Z: 1
Direction2:
X-Vector:
X_X: 0
X_Y: 0
X_Z: 1
Y-Vector:
Y_X: 0
Y_Y: -1
Y_Z: 0
Z-Vector:
Z_X: 1
Z_Y: 0
Z_Z: 0
That looks like this (Direction1 on the left, Direction2 on the right):
I have to filter out the information about the rotation from direction1 to direction2 now.
There are algorithms f.e. which calculates the rotation of vector1 to vector 2, but here i have 3 vectors and i don't know, how i can calculate the euler rotation angle here.
I thought about summarizing the 3 Vectors to 1, f.e. picture 1 would be (1,1,1) and pic2 would be (1,-1,1), but the problem here is that the information, which axe points in which direction gets lost.
Has somebody an idea?
Seems that you want to find affine transformation that transforms one triplet of non-coplanar vectors into another triplet.
Make matrices A and B and unknown rotation matrix M.
Here column vector like x1 y1 z1 is your X_X X_Y X_Z and so on.
M * A = B
|x1 x2 x3 0| |x1` x2` x3` 0|
M * |y1 y2 y3 0| = |y1` y2` y3` 0|
|z1 z2 z3 0| |z1` z2` z3` 0|
|1 1 1 1| |1 1 1 1|
find inverse matrix InvA for the A and multiply both sides by IA
M * A * InvA = B * InvA
M * |1 |= B * InvA
M = B * InvA
Now you have matrix M needed to transform vectors.
Rotation about 5,0,0
|1 0 0 -5| |1 0 0 5|
M' = |0 1 0 0| * M * |0 1 0 0|
|0 0 1 0| |0 0 1 0|
|0 0 0 1| |0 0 0 1|

How to make nested ifelse loop dynamic

i Have a Df:abc as below
Sr|VALUE
a |85
b |120
c |145
d |225
e |100
f |325
g |410
I am writing below code to create a count for each record such that its 0 for VALUE<100,1 for VALUE between[100,200),2 for VALUE>=200
Stepdif<-100
abc = within(abc, {
Count = ifelse(abc$VALUE>=Stepdif & abc$VALUE<2*Stepdif,1,ifelse(abc$VALUE>=2*Stepdif ,2,0))
})
to give result as
Sr|VALUE|Count
a |85 |0
b |120 |1
c |145 |1
d |225 |2
e |100 |1
f |325 |2
g |410 |2
Now i want a code using which i can define count for each duration of 100. I dont want to write code as such
Count = ifelse(abc$VALUE>=Stepdif & abc$VALUE<2*Stepdif,1,ifelse(abc$VALUE>=2*Stepdif & abc$VALUE<3*Stepdif,2,ifelse(abc$VALUE>=3*Stepdif & abc$VALUE<4**Stepdif,3,ifelse(abc$VALUE>=4*Stepdif ,4,0))))
Rather i want to make it dynamic so that if i change the no of iteration from 4 to 6 , i dont have to rewrite the code again.
expected result
Sr|VALUE|Count
a |85 |0
b |120 |1
c |145 |1
d |225 |2
e |100 |1
f |325 |3
g |410 |4
hope this will be of help:
funfun=function(x,n){n=1:n*100; findInterval(x,n)}
funfun(k$VALUE,2)
[1] 0 1 1 2 1 2 2
funfun(k$VALUE,4)
[1] 0 1 1 2 1 3 4

R data frame - Include NAs in aggregation [duplicate]

This question already has an answer here:
Aggregate with na.action=na.pass gives unexpected answer
(1 answer)
Closed 6 years ago.
With a data frame df1 like below
+-----------------------------------------+
|reg |make |model |year|abs |gears|fm|
+-----------------------------------------+
|ax1234|Toyota|Corolla|1999|true |6 |0 |
|ax1235|Toyota|Corolla|1999|false|5 |0 |
|ax1236|Toyota|Corolla|1992|false|4 |NA|
|ax1237|Toyota|Camry |2001|true |7 |1 |
|ax1238|Honda |Civic |1994|true |5 |NA|
|ax1239|Honda |Civic |2000|false|6 |0 |
|ax1240|Honda |Accord |1992|false|4 |NA|
|ax1241|Nissan|Sunny |2001|true |6 |0 |
|ax1242| | |1998|false|6 |0 |
|ax1243|NA |NA |1992|false|4 |NA|
+-----------------------------------------+
On aggregation like below, I want to preserve makes with NA - how to achieve this ? It is fine to have the make and NA are combined together.
> aggregate(reg ~ make, df1, length)
make reg
1 1
2 Honda 3
3 Nissan 1
4 Toyota 4
We can use dplyr and it gives the NA count as well
library(dplyr)
df1 %>%
group_by(make) %>%
summarise(reg = n())

Recode Variable in R after matching with another Data Frame

I have 2 dataframes in R,
DF1
|attr1|attr2|attr3|
|5 |4 |9 |
|4 |30 |2 |
|5 |18 |1 |
|3 |1 |7 |
|6 |30 |0 |
|8 |18 |12 |
Now, i'm trying to recode the values present within the attr2 column in this dataframe in a manner such that if the value in attr2 is present within the col1 in DF2 then it should be recoded as 1 otherwise as 0. The second dataframe is simply a count of the top 2 unique values within attr2
DF2
|Var1|Freq|
|30 |2 |
|18 |2 |
I want the result to be in the format of something as follows:
|attr1|attr2|attr3|
|5 |0 |9 |
|4 |1 |2 |
|5 |1 |1 |
|3 |0 |7 |
|6 |1 |0 |
|8 |1 |12 |
Thanks for the help!
We can use
library(dplyr)
DF1 %>%
mutate(attr2 = as.integer(attr2 %in% DF2$Var1))

Resources