I need to create a data frame with the new column 'Variable1' in R below are the requirement. I have the one column name as 'Country' and another column name as 'City'. I need to check first if the data is available in the City column, if there is no data then move to the Country column based on the week. example:
Country Count
A 5
B 6
C 7
City Count
A 3
B 5
When I create a new column it first checks the count in the City column for all count values.. and if it's not available for it will move to Country and bring that count. and if even its not available in the country then it will just fill forward the last value
NEW Variable
A - 3
B-5
C-7
Can someone please assist with it? Is there any way to do it?
Related
My goal is to administer both variables and their values in a spreadsheet.
Basically I want to be able to add the new values for a new year in a new column and load them into R.
I then want to assign the variables named in the first column with the corresponding value in either one of the second or third column.
Input spreadsheet:
Variable
Year2013
Year2018
age
12
17
pets
c(cat,dog,elephant)
c(dog,mouse)
cars
cars$name
cars$name
Desired Output:
For year 2013
import("dataspreadsheet.csv")
derived from this -->
age <- 12
pets <- c(cat,dog,elephant)
cars <- cars$name
Is there any way to tell R to make this assignment?
Complete R novice here.
I have wide form data frame which includes a vector/variable for participant_number, with each participant providing two responses (score), with a within-subjects manipulation (code).
enter image description here
However, I have three separate sets of values which corresponded to the participant numbers in three different (between subjects) experimental groups (e.g. control, active_1, active_2).
enter image description here
How can I use these sets of values to create a variable in my main data frame which indicates what experimental group the participant belongs to?
Any help, much appreciated.
The package "dplyr" is quite useful for these kind of things. Let's consider a small working example
df <- data.frame(ID=c(1:7))
ListActive1 <- c(1,3)
ListActive2 <- c(2,5)
ListControl <- c(4,7,6)
df is the main data frame containing the ID of the participant (and of course it may have further columns, e.g. the score etc.) The three vectors contain for each group the IDs of the participants belonging to this particular group, e.g. the participants with ID 2 and 5 belong to the group "Active2".
Now we create a new column in the main data frame using the command mutate which comes with the dplyr package (make sure to install and load it).
df <- mutate(df,group=case_when(
ID %in% ListActive1 ~ "Active1",
ID %in% ListActive2 ~ "Active2",
ID %in% ListControl ~ "Control"))
The command case_when checks for each participant in which of the lists the ID appears and then puts the corresponding label in the new column group.
ID group
1 1 Active1
2 2 Active2
3 3 Active1
4 4 Control
5 5 Active2
6 6 Control
7 7 Control
I have an ascii file that contains one week of data. This data is a text file and does not have header names. I currently have nearly completed a smaller task using R, and have made some attempts with Python as well. Being a pro at neither, its been a steep learning curve. Here is my data/code to paste rows together based on a specific sequence of chr in R that I created and is not working.
Each column holds different data, but the row data is what matters most. for example:
column 1 column 2 column 3 column 4
Row 1 Name Age YR Birth Date
Row 2 Middle Name School name siblings # of siblings
Row 3 Last Name street number street address
Row 4 Name Age YR Birth Date
Row 5 Middle Name School name siblings # of siblings
Row 6 Last Name street number street address
Row 7 Name Age YR Birth Date
Row 8 Middle Name School name siblings # of siblings
Row 9 Last Name street number street address
I have a folder to iterate or loop over that some files hold 100's of rows, and others hold 1000's. I have a code written that drops all the rows I don't need, and writes to a new .csv however, any pasting and/or merging isn't producing the desirable results.
What I need is a code to select only the Name and Last name rows (and their adjacent data) from the entire file and paste the last name row beside the end of the name row. Each file has the same amount of columns but different rows.
I have the file to a data frame, and have tried merging/pasting/binding (r and c) the rows/columns, and the result is still just shy of what I need. Rbind works the best thus far, but instead of producing the data with the rows pasted one after another on the same line, they are pasted beside each other in columns like this:
ie:
Name Last Name Name Last Name Name Last Name
Age Street Num Age Street Num Age Street Num
YR Street address YR Street address YR Street address
Birth NA Birth NA Birth NA
Date NA Date NA Date NA
I have tried to rbind them or family[c(Name, Age, YR Birth...)] and I am not successful. I have looked at how many columns I have and tried to add more columns to account for the paste, and instead it populates with the data from row 1.
I'm really at a loss here and if anyone can provide some insight I'd really appreciate it. I'm newer than some, but not as new as others. The results I am achieving look like:
Name Age YR Birth date Last Name Street Num Street Address NA NA
Name Age YR Birth date Last Name Street Num Street Address NA NA
Name Age YR Birth date Last Name Street Num Street Address NA NA
codes tried:
rowData <- rbind(name$Name, name$Age, name$YRBirth, name$Date)
colData <- cbind(name$V1 == "Name", name$V1 == "Last Name")
merge and paste also do not work. I have tried to create each variable as new data frames and am still not achieving the results I am looking for. Does anyone have any insight?
Ok, so if I understand your situation correctly, you want to first slice your data and pull out every third row starting with the 1st row and then pull out every 3rd row starting with the 3rd row. I'd do it like this (assume your data is in df:
df1 <- df[3*(1:(nrow(df)/3)) - 2,]
df2 <- df[3*(1:(nrow(df)/3)),]
once you have these, you can just slap them together, but instead of using rbind you want to use cbind. Then you can drop the NA columns and rename them.
df3 <- cbind(df1,df2)
df3 <- df3[1:7]
colnames(df3) <- c("Name", "Age", "YR", "Birth date", "Last Name", "Street Num", "Street Address")
I have a dataset with multiple records, each of which is assigned a country, and I want to create a worldmap using rworldmap, coloured according to the frequency with which each country occurs in the dataset. Not all countries appear in the dataset - either because they have no corresponding records or because they are not eligible (e.g. middle/low income countries).
To build the map, I have created a dataframe (dfmap) based on a table of countries, where one column is the country code and the second column is the frequency with which it appears in the dataset.
In order to identify on the map countries which are eligible, but have no records, I have tried to use add_row to add these to my dataframe e.g. for Andorra:
add_row(dfmap, Var1="AND", Freq=0)
When I run add_row for each country, it appears to work (no error message and that new row appears in the table below the command) - but previously added rows where the Freq=0 do not appear.
When I then look at the dataframe using "dfmap" or "summary(dfmap)", none of the rows where Freq=0 appear, and when I build the map, they are coloured as for missing countries.
I'm not sure where I'm going wrong and would welcome any suggestions.
Many thanks
Using the method suggested in the comment above, one can use the join function and then the replace_na function to create a tibble with the complete country and give these a count value of zero.
As there was no sample data in the question i created two data frames below based on what I thought was implied by the question.
dfrm_counts = tibble(Country = c('England','Germany'),
Count = c(1,4))
dfrm_all = tibble(Country = c('England', 'Germany', 'France'))
dfrm_final = dfrm_counts %>%
right_join(dfrm_all, by = "Country") %>%
replace_na(list(Count = 0))
dfrm_final
# A tibble: 3 x 2
Country Count
<chr> <dbl>
1 England 1
2 Germany 4
3 France 0
Edit: using the aid from one of the users, I was able to use "table(ArrestData$CHARGE)", yet, since there are over 2400 entries, many of the entries are being omitted. I am looking for the top 5 charges, is there code for this? Additionally, I am looking at a particular council district (which is another variable titled "CITY_COUNCIL_DIST"). I want to see which are the top 5 charges given out within a specific council district. Is there code for this?
Thanks for the help!
Original post follows
Just like how I can use "names(MyData)" to see the names of my variables, I am wondering if I can use a code to see the names/responses/data points of a specific column.
In other words, I am attempting to see the names in my rows for a specific column of data. I would like to see what names are cumulatively being used.
After I find this, I would like to know how many times each name within the rows is being used, whether thats numeric or percentage. After this, I would like to see how many times each name within the rows is being used with the condition that it meets a numeric value of another column/variable.
Apologies if this, in any way, is confusing.
To go further in depth, I am playing around with the Los Angeles Police Data that I got via the Office of the Mayor's website. From 2017-2018, I am attempting to see what charges and the amount of each specific charge were given out in Council District 5. CHARGE and CITY_COUNCIL_DIST are the two variables I am looking at.
Any and all help will be appreciated.
To get all the distinct variables, you can use the unique function, as in:
> x <- c(1,1,2,3,3,4,5,5,5,6)
> unique(x)
[1] 1 2 3 4 5 6
To count the number of distinct values you can use table, as in:
> x <- c(1,1,2,3,3,4,5,5,5,6)
> table(x)
x
1 2 3 4 5 6
2 1 2 1 3 1
The first row gives you the distinct values and the second row the counts for each of them.
EDIT
This edit is aimed to answer your second question following with my previous example.
In order to look for the top five most repeated values of a variable we can use base R. To do so, I would first create a dataframe from your table of frequencies:
df <- as.data.frame(table(x))
Having this, now you just have to order the column Freq in descending order:
df[order(-df$Freq),]
In order to look for the top five most repeated values of a variable within a group, however, we need to go beyond base R. I would use dplyr to create an augmented dataframe with frequencies for each value of the variable of interest, let it be count_variable:
library(dplyr)
x_or <- x %>%
group_by(group_variable, count_variable) %>%
summarise(freq=n())
where x is your original dataframe, group_variable is the variable for your groups and count_variable is the variable you want to count. Now, you just have to order the object in a way you get the frequencies of your count_variable ordered by group_variables:
x_or %>%
arrange(group_variable, count_variable, freq)