I am trying to perform a Social Network Analysis of Congressional Roll Call data. The data I have comes as a csv, from voteview.com, and has the following format:
Format of the csv
There are a high number of unique bills (represented by roll number) that I need to loop through to see how often politicians (represented by icpsr) agree in their vote (represented by cast_code).
However, I am really unsure of how I would loop through this data frame, check if two politicians vote the same on a unique bill, and then add that to a new data frame which would have three columns [politician 1|politican 2|weight (how many times they voted the same on unique bills)].
I have produced the following code when there was just a single bill being considered, which was able to get me a network map:
#1. creating a dataframe with all the yayers and one with all the nayers
yay_list <- S117 %>% filter(cast_code == '1')
nay_list <- S117 %>% filter(cast_code == '6')
#2. a list of the icpsr numbers who agree for yay and nay
y_list <- list(yay_list$icpsr)
n_list <- list(nay_list$icpsr)
#3. trying to use this list to make an igraph graph - BUT it does not recognise it
# I am not sure where to go next
make_ring(yay_list)
a1 <- as_adj_list(y_list)
#4. Alternative method - using only columns for icpsr & cast_code
# this will make an edge/adjency style data frame
foo <- S117[, c("icpsr", "cast_code")]
library(plyr)
# define a function returning the edges for a single group
group.edges <- function(x) {
edges.matrix <- t(combn(x, 2))
colnames(edges.matrix) <- c("Sen_A", "Sen_B")
edges.df <- as.data.frame(edges.matrix)
return(edges.df)
}
# apply the function above to each group and bind altogether
all.edges <- do.call(rbind, lapply(unstack(foo), group.edges))
# add weights if needed
#all.edges$weight <- 1
#all.edges <- aggregate(weight ~ Sen_A + Sen_B, all.edges, sum)
all.edges
#convert to a dataframe for igraph
df <- data.frame(all.edges)
df
# use igraph function on new datafame and plot
g <- graph_from_data_frame(df)
print(g, e=TRUE, v=TRUE)
plot(g)
# a plot is produced, which is good, but I do not know how to do this for
# a situation where there are multiple bills - it seems very complicated
Does anyone have any advice on how I would create a similar style edge list data frame, ideally with weights (as there are many bills in the data frame not just 1)?
The weight should show how many times politicians vote the same way (either yay or nay) on unique bills.
Thanks!
I have collected data from a survey in order to perform a choice based conjoint analysis.
I have preprocessed and clean data with python in order to use them in R.
However, when I apply the function dfidx on the dataset I get the following error: the two indexes don't define unique observations.
I really do not understand why. Before creating the .csv file I checked if there were duplicates through the pandas function final_df.duplicated().sum() and its out put was 0 meaning that there were no duplicates.
Can please some one help me to understand what I am doing wrong ?
Here is the code:
df <- read.csv('.../survey_results.csv')
df <- df[,-c(1)]
df$Platform <- as.factor(df$Platform)
df$Deposit <- as.factor(df$Deposit)
df$Fees <- as.factor(df$Fees)
df$Financial_Instrument <- as.factor(df$Financial_Instrument)
df$Leverage <- as.factor(df$Leverage)
df$Social_Trading <- as.factor(df$Social_Trading)
df.mlogit <- dfidx(df, idx = list(c("resp.id","ques"), "position"), shape='long')
Here is the link to the dataset that I am using https://github.com/AlbertoDeBenedittis/conjoint-survey-shiny/blob/main/survey_results.csv
Thank you in advance for you time
The function dfidx() is build for data frames "for which observations are defined by two (potentialy nested) indexes" (ref).
I don't think this function is build for more than two idxs. Especially that, in your df, there aren't any duplicates ONLY when considering the combinations of the three columns you mention above (resp.id, ques and position).
One solution to this problem is to "combine" the two columns resp.id and ques into one (called for example resp.id.ques) with paste(...).
df$resp.id.ques <- paste(df$resp.id, df$ques, sep="_")
Then you can write the following line which should work just fine:
df.mlogit <- dfidx(df, idx = list("resp.id.ques", "position"))
my question is a follow-up to this question on imputation by group using "mice":
multiple imputation and multigroup SEM in R
The code in the answer works fine as far as the imputation part goes. But afterwards I am left with a list of actually complete data but more than one set. The sample looks as follows:
'Set up data frame'
df.g1<-data.frame(ID=rep("A",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,10,20)),x3=floor(runif(5,100,150)))
df.g2<-data.frame(ID=rep("B",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,25,50)),x3=floor(runif(5,200,250)))
df.g3<-data.frame(ID=rep("C",5),x1=floor(runif(5,4,5)),x2=floor(runif(5,75,99)),x3=floor(runif(5,500,550)))
df<-rbind(df.g1,df.g2,df.g3)
'Introduce NAs'
df$x1[rbinom(15,1,0.1)==1]<-NA
df$x2[rbinom(15,1,0.1)==1]<-NA
df$x3[rbinom(15,1,0.1)==1]<-NA
df
'Impute values by group:'
df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(df,m=5)))
df.clean
As you can see, df.clean is a list of 3. One element per group. But each element containing a complete data set I am looking for.
The original answer suggests to rbind() the obtained data in df.clean which leaves me with a new data set with 45 (3x the original size) observations.
Here is the original code for the last step:
imputed.both <- do.call(args = df.clean, what = rbind)
Which data is the "right" one? And why the last step?
Thanks a bunch!
There's a bug in the code, i have a edited version below that works:
#Set up data frame
set.seed(12345)
df.g1<-data.frame(ID=rep("A",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,10,20)),x3=floor(runif(5,100,150)))
df.g2<-data.frame(ID=rep("B",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,25,50)),x3=floor(runif(5,200,250)))
df.g3<-data.frame(ID=rep("C",5),x1=floor(runif(5,4,5)),x2=floor(runif(5,75,99)),x3=floor(runif(5,500,550)))
df<-rbind(df.g1,df.g2,df.g3)
#Introduce NAs
df$x1[rbinom(15,1,0.1)==1]<-NA
df$x2[rbinom(15,1,0.1)==1]<-NA
df$x3[rbinom(15,1,0.1)==1]<-NA
# check NAs
colSums(is.na(df))
#Impute values by group:
# here's the bug
df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5)))
imputed.both <- do.call(args = df.clean, what = rbind)
dim(imputed.both)
# returns 15,4
In the code in the question, you have
df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(df,m=5)))
dim(do.call(rbind,df.clean))
#this returns 45,4
The function is specified with "x" but you call df from the global environment. Hence you impute on the complete df.
So to answer your question, if you do this step:
split(df,df$ID)
You split your data frame into a list of data.frames with only A,B or Cs. Then if you lapply through this list, you get
df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5)))
names(df.clean)
lapply(df.clean,dim)
each item of the list df.clean contains a subset of the original df, with ID being A, B or C. Now you combine this list together into a data.frame using:
imputed.both <- do.call(rbind,df.clean)
I'm working with data regarding people and what class of medicine they were prescribed. It looks something like this (the actual data is read in via txt file):
test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.table(test)
test <- unique(test[, 1:2])
test
The table has about 5 million rows, 45k unique patients, and 49 unique medicines. Some patients have multiples of the same medicines, which I remove. Not all patients have every medicine. I want to make each of the 49 unique medicines into separate columns, and have each unique patient be a row, and populate the table with 1s and 0s to show if the patient has the medicine or not.
I was trying to use spread or dcast, but there's no value column. I tried to amend this by adding a row of 1s
test$true <- rep(1, nrow(test))
And then using tidyr
library(tidyr)
test_wide <- spread(test, med, true, fill = 0)
My original data produced this error but I'm not sure why the new data isn't reproducing it...
Error: `var` must evaluate to a single number or a column name, not a list
Please let me know what I can do to make this a better reproducible example sorry I'm really new to this.
It looks like you are trying to do onehot encoding here. For this please refer to the "onehot" package. Details are here.
Code for reference:
library(onehot)
test <- matrix(c(1,"a",1,"a",1,"b",2,"a",2,"c"),ncol=2,byrow=TRUE)
colnames(test) <- c("id","med")
test <- as.data.frame(test)
str(test)
test$id <- as.numeric(test$id)
str(test)
encoder <- onehot(test)
finaldata <- predict(encoder,test)
finaldata
Make sure that all the columns that you want to be encoded are of the type factor. Also, I have taken the liberty of changing data.table to data.frame.
Suppose I assign data=Abortion (Abortion data set given in the ltm package). I have some function where one of the inputs is data.
While using the function, I will write.
function.name(data=Abortion)
For writing the summary of the results I want the name of the data set I used; here in this case it is Abortion.
How can I get that name back?
In more general sense. suppose I have some object which has some name abc. I assign xyz=abc and now how can I get the name abc back?
I suggest to rethink your approach. I assume you are trying to loop through different datasets and get results. Try following example:
#dummy data
dat1 <- runif(10)
dat2 <- runif(10)
dat3 <- runif(10)
#my function
myfunc <- function(data) max(data)
#make a list - creating list of data manually, this is done automatically,e.g.:
# lapply(list.files(),read.table)
all_dat <- list(dat1,dat2,dat3)
#add names to list
names(all_dat) <- c("dat1","dat2","dat3")
#loop through dat1,2,3
sapply(all_dat,myfunc)