So I have networks made up of funders->recipients (first two columns in data frame), made using graph_from_data_frame(igraph) in R.
More columns in the data frame include info on donor / recipient type (of institution, uni, gov, ri etc), and total USD invested (integer).
I'd like to colour code nodes/vertices by organisation type, and include size of nodes by total USD.
example of my data frame
'''
Donor Recipient Recipient.type Total.USD
NIH UCLA Univ 122
WHO Vax.PLC Firm 80
Wellcome LSTHM Org 104
'''
Related
I have a data frame I am applying a function to using mapply(), but my result doesn't keep certain info in the initial data frame that I would like to keep. The function I am applying across the data frame is not vectorized.
DATA FRAME
Dealer Cities Zip Radius in miles
A Rancho Cucamonga, CA 91730 40
A San Bernardino, CA 92401 40
B Chino, CA 91710 40
B Fontana, CA 92337 40
I am applying a function that gets all zip codes in the given mile radius of the initial zip.
remotes::install_github("EAVWing/ZipRadius")
results <- with(city_names, mapply(ZipRadius::zipRadius,
as.character(`Center Zip Code`),`Radius in miles`, SIMPLIFY =FALSE ))
RESULT
The result is a large list containing a data frame for each time the function zipRadius was called.
DESIRED RESULT
Each data frame contains the zip code used by the function and the data the function generates, but I would like it to also keep the corresponding "Dealer" column associated with each initial Zip.
For example, the above data frame is generated from the zip 91730, which has a Dealer value of A.
DEALER ZIP Other columns generated by the function...
A 90001
A 90002
A 90011
#Onyambu gave me a great solution in the comments (source):
dplyr::left_join(city_names, dplyr::bind_rows(results, .id ='zip1'),
by =c(Zip= 'zip1'))
My Dataset looks something like this. Note below is hypothetical dataset.
Objective: Sales employee has to go to a particular location and verify the houses/Stores/buildings and device captures below mentioned information
Sr.No.
Store_Name
Phone-No.
Agent_id
Area
Lat-Long
1
ABC Stores
89099090
121
Bay Area
23.909090,89.878798
2
Wuhan Masks
45453434
122
Santa Fe
24.452134,78.123243
3
Twitter Cafe
67556090
123
Middle East
11.889766,23.334483
4
abc
33445569
121
Santa Cruz
23.345678,89.234213
5
Silver Gym
11004110
234
Worli Sea Link
56.564311, 78.909087
6
CK Clothings
00908876
223
90 th Street
34.445887, 12.887654
Facts:
#1 Unique Identifier for finding Duplicates – ** Check Sr.No 1 & 4 basically same
In this dummy dataset all the columns can be manipulated i.e. for same store/house/building-outlet
a) Since Name is entered manually for same house/store names can be changed and entered in the system -
multiple visits can happen
b) Mobile number can also be manipulated, different number can be associated with same outlet
c) Device with Agent capturing lat-long info also can be fudged - by moving closer or near to the building
Problem:
How to make Lat-Long Data as the Unique Identifier keeping in mind point - c), above for finding duplicates in the huge dataset.
Deploying QR is not also very helpful as this can also be tweaked.
Hereby stopping the fraudulent practice by an employee ( Same emp can visit same store/outlet or a different emp can also again visit the same store outlet to increase visit count)
Right now I can only think of Lat-Long Column to make UID please feel free to suggest if anything else can be made
So I am currently using data manipulation techniques to manipulate data in R. My dataset has CHILDID(fname), Channels, delta, alpha and other attributes. it is an EEG data(refer to pic below). Every ChildID(fname) has 14 channels(AF3, AF4, F3, F8, O1, P7, T8 etc.). I also have a group table which groups channels into three categories(1,2,3) for every CHILDID(fname).
So like my task is to add a column to the data frame named group which states the group no. of every channel.
The groups variable is present in table form as follows:
groups<-cutree(hc2, k=3)
print(groups)
The final outcome should be like this:
fname Channel delta theta ................ Group
901.01.257.... AF3 55.1 9.3 ................ 1
Use match to match channel column of dataframe with names of groups, get the corresponding group number and add it as new column.
m6$group <- groups[match(m6$channel, names(groups))]
My data looks like this.
AK ALASKA DEPT OF PUBLIC SAFETY 1005-00-073-9421 RIFLE,5.56 MILLIMETER
AK ALASKA DEPT OF PUBLIC SAFETY 1005-00-073-9421 RIFLE,5.56 MILLIMETER
I am looking to filter the data in multiple different ways. For example, I filter by the type of equipment, such as column 4, with the code
rifle.off <- city.data[[i]][city.data[[i]][,4]=="RIFLE,5.56 MILLIMETER",]
Where city.data is a list of matrices with data from 31 cities (so I iterate through a for loop to isolate the rifle data for each city). I would like to also filter by the number in the third column. Specifically, I only need to filter by the first two digits, i.e. I would like to isolate all line items where the number in column 3 begins with '10'. How would I modify my above code to isolate only the first two digits but let all the other digits be anything?
Edit: Providing an example of the city.data matrix as requested. First off city.data is a list made with:
city.data <- list(albuq, austin, baltimore, charlotte, columbus, dallas, dc, denver, detroit)
where each city name is a matrix. Each individual matrix is isolated by police department using:
phoenix <- vector()
for (i in 1:nrow(gun.mat)){
if (gun.mat[i,2]=="PHOENIX DEPT OF PUBLIC SAFETY"){
phoenix <- rbind(gun.mat[i,],phoenix)
}
}
where gun.mat is just the original matrix containing all observations. phoenix looks like
state police.dept nsn type quantity price date.shipped name
AZ PHOENIX DEPT OF PUBLIC SAFETY 1240-01-411-1265 SIGHT,REFLEX 1 331 1 3/29/13 OPTICAL SIGHTING AND RANGING EQUIPMENT
AZ PHOENIX DEPT OF PUBLIC SAFETY 1240-01-411-1265 SIGHT,REFLEX 1 331 1 3/29/13 OPTICAL SIGHTING AND RANGING EQUIPMENT
AZ PHOENIX DEPT OF PUBLIC SAFETY 1240-01-411-1265 SIGHT,REFLEX 1 331 1 3/29/13 OPTICAL SIGHTING AND RANGING EQUIPMENT
Try this:
The original data that you have in the first block in the question. Subset it.
Rifle556<-subset(data, data$column4 == "RIFLE,5.56 MILLIMETER")
After that, subset the data again that don't start with "10" from column 3
s <- '10'
Rifle55610<-subset(Rifle556, grep(s, column3, invert=T)
This way you have the data subset according to your condition.
I should show in a diagram how the variable, avgflow, has evolved over time (1992-2006) for three groups of observations: i) intra-Euroland trade flows (EMU-EMU country pairs), ii) extra-Euroland trade flows (non EMU-non EMU country pairs), and iii) trade flows between EMU and non EMU country pairs. Keep the three groups constant over time, such that, e.g., Germany-France country pairs are classified as EMU-EMU for all years 1992-2006. Use 1999 as index 100.
I have created two dummy variable for the 3 groups of observations. The dummy variable, emu, is 1 when it is EMU-EMU country pairs and 0 when is non EMU-non EMU country pairs. And the dummy variable, emu1, is 1 when trade flows between EMU and non EMU country pairs.
I know I should use the PROC GPLOT, but I am not sure how to exactly use it for this case. Can someone help me?
Thanks in advance.