I have a dataset of film with several columns, one of which is a column for country. Because some films are produced by more than one country, the film can have different countries at the same time in the "country" column. For example,
enter image description here
I now want to create a new dataset in which each row in “country” column can only has one country. For example, in the screenshot above, Bluebeard are produced by “France”, “Germany”, and “Italy” country. Right now, I want the dataset showing that Bluebeard is produced by “France”, “Germany”, and “Italy” country separately.
I tried strsplit()and colsplit() function, but that doesn’t seem to convert comma-separated "country" column into multiple columns that only contain one country each row.
Any suggestions?
Thank you!
Using tidyr:
separate_rows(data, country, sep = ", ")
Related
I have one column with multiple sets of variables as listed below. I have to divide this column into several variables. For example, this column has information about country, tax number, name, address, date, etc. I have to create multiple variables - Country : mx, Tax Number : DID100927249, etc.- from this column but I am not sure how to do it. Does anyone know how I can do this?
{'country': ['mx'], 'taxNumber': ['DID100927249'], 'sourceUrl': ['https://sanctionssearch.ofac.treas.gov/Details.aspx?id=13009'], 'name': ['DISPOSITIVOS INDUSTRIALES DINAMICOS, S.A. DE C.V.'], 'alias': ['DISDA'], 'topics': ['sanction'], 'addressEntity': ['addr-f1636ec5b03213730a35200572c1273749727da4'], 'createdAt': ['2012-04-12']}
I tried the following code but in each column the order of properties is different so it is not working well :(
separate(df, col=properties, into=c('a', 'b','c','d','e'), sep=',')
for an assignment I have to use fuzzy matching in R to merge two different datasets that both had a "Country" column. The first dataset is from Kaggle(Countries dataset) while the other is from ISO 3166 standard. I already use fuzzy matching it worked well. I add both data sets a new column that counts a number of observations(it is a must for fuzzy matching as far as I understand) 1 from their respectable lengths. That I named "Observation number" For my first dataset, there are 227 observations and for the ISO dataset, there are 249 observations.
I want to create a new dataset that includes columns from my first dataset(I had to use this data set specifically it has columns like migration, literacy, etc) and Country codes from the ISO dataset. I couldn't manage to do it. fuzzy matching output gave me how the first data set's observation numbers change in the ISO dataset. (For example in the first dataset countries ordered such as Afghanistan, Albania, Algeria.... whilst in ISO order in Albania, Algeria, Afghanistan) so for that fuzzy match output gave me 3,1,2... I understand this means 3rd observation in the ISO dataset is 1st in the Countries dataset.
I want to create a new data set that has all the information on the Countries datasets ordered withrespect to ISO datasets' Country columns' order.
However i cannot do it using
a=(Result1$matches)$observationnumber
#gives me vector a, where can I find i'th observation of Country dataset in ISO dataset
countryorderedlikeISO <- countries.of.the.world[match(c(a), countries.of.the.world$observation),]
It seems to ignore the countries that are present in ISO but not in the country dataset.
What can I do? I want this new dataset to be in ISO's length, with NA values for observations that are present in ISO but not in Country.
I want to create multiple columns based on subjects and Marks. data in column Name and Age will remain same for different subjects as shown
My first table is input data and the second one is desired output.
Basically what we have is several columns as follows:
Household ID, restaurantspend, groceryspend, foodtruckspend
We have duplicate household ids because each spend is in its own individual column so an example of our data looks like this:
data example
We want to have the Household ID only have 1 row per id and combine the numerical values of the other column.
aggdata = aggregate(mydata, by=list(mydata$HouseHoldID),Fun=sum)
I have created the above table and saved it as "Mydata". Run the above code. View the output "aggdata", you can see an extra column "Group.1" that's the group based on "HouseHoldID". You can ignore the second column "HouseHoldId" as the same information will be available in the column "Group.1".
I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.