How to use pandas Grouper on multiple keys? - datetime

I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus:
How to use pandas.Grouper on multiple columns?

Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this:
df['result'] = df.groupby([
pd.Grouper('dt', freq='D'),
pd.Grouper('other_column')
]).transform(foo)

If your second column is a non-datetime series, you can group it with a date-time column like this:
df['res'] = df.groupby([
pd.Grouper('dt', freq='D'),
'other_column'
]).transform(foo)
Note that in this case you don't have to use pd.Grouper for second column beacuse its a string object and not a time object. pd.Grouper is only compatible with datetime columns.

Related

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

How to reference a column in a dataframe through a variable in R

I'm trying to iterate through columns in an R data.frame.
To do so, I'm hoping to write a for loop which loops over the column names and then filters the data.table accordingly with values.
My issue is that given the syntax:
df[which(df$XX == y), ]
XX needs to actually be a column name versus a variable that is a string equivalent to the column name.
Is there a way to loop over the columns via inputting a variable?
Many thanks!

Create Column in Data Frame that Indicates Repeated Value in Another Column

Say I have a data table like this in R:
Data Table
And I want to a add column to this table which indicates if the person switched majors (like "Y" for switched, "N" for didn't switch), how would I do that? I've tried using the count and unique functions but don't know how to proceed.
You can simply add a column IsSwitched by using by clause of data.table:
DT[, IsSwitched:= ifelse(.N>1,"Y","N"), by=Id]
Where DT is your data.table.

which() function in R

I have a data frame(A) of size (92047x2) and a list(B) of size (1829). I want to create a new data frame with all rows of A whose first column value is present in B.
How to use which()? Or any other good way to approach this?
All the values are in form of character. (Eg. "Vc2345")
You can do it like that:
dfA=data.frame(C1=sample(1:92047), C2=sample(1:92047))
listB=list(sample(1:1829))
dfAinB=dfA[which(dfA$C1 %in% unlist(listB)),]
str(dfAinB)

Unique function leaves out column

Say I use the UNIQUE function of R to create a script to pull out particular columns of a premade dataframe to make a new one:
SUPSCIARIDS<-unique(SuperiorSciarids[,c(36,2,3,4:34)])
36-LOGID
2-Decay
3-Diameter
4:34 are the species
Why would it be do you think the new data frame does not show column 2?
I've found the answer.....that annoying dummy column "row.names" that often appears before column one.
I was considering row.names as the first column in my matrix, thus if I add 1 to the beginning of the column query it brings up the Decay variable in my unique table, BINGO! and how embarassing......>
SUPSCIARIDS<-unique(SuperiorSciarids[,c(35,1,2,3,4:34)])

Resources