Converting a column in a single cell in R - r

I have a dataframe "animal" like the following:
Word Frequency
Dog 5
Cat 6
I want it to look like the following:
Word
"Dog","Cat"
I have used as.vector, as.list but havent been successful. Please help. Thanks

You can use toString
toString(animal$Word)
#[1] "Dog, Cat"

Related

How do I find the sum of a category under a subset?

So... I'm very illiterate when it comes to RStudio and I'm using this program for a class... I'm trying to figure out how to sum a subset of a category. I apologize in advance if this doesn't make sense but I'll do my best to explain because I have no clue what I'm doing and would also appreciate an explanation of why and not just what the answer would be. Note: The two lines I included are part of the directions I have to follow, not something I just typed in because I knew how to - I don't... It's the last part, the sum, that I am not explained how to do and thus I don't know what to do and would appreciate help figuring out.
For example,
I have this:
category_name category2_name
1 ABC
2 ABC
3 ABC
4 ABC
5 ABC
6 BDE
5 EFG
7 EFG
I wanted to find the sum of these numbers, so I was told to put in this:
sum(dataname$category_name)
After doing this, I'm asked to type this in, apparently creating a subset.
allabc <- subset(dataname, dataname$category_name2 == "abc")
I created this subset and now I have a new table popped up with this subset. I'm asked to sum only the numbers of this ABC subset... I have absolutely no clue on how to do this. If someone could help me out, I'd really appreciate it!
R is the software you are using. It is case-sensitive. So "abc" is not equal to "ABC".
The arguments are the "things" you put inside functions. Some arguments have the same name as the functions (which is a little confusing at first, but you get used to this eventually). So when I say the subset argument, I am talking about your second argument to the subset function, which you didn't name. That's ok, but when starting to learn R, try to always name your arguments.
So,
allabc <- subset(dataname, dataname$category_name2 == "abc")
Needs to be changed to:
allabc <- subset(dataname, subset=category2_name == "ABC")
And you also don't need to specify the name of the data again in the subset argument, since you've done that already in the first argument (which you didn't name, but almost everyone never bothers to do that).
This is the most easily done using tidyverse.
# Your data
data <- data.frame(category_name = 1:8, category_name2 = c(rep("ABC", 5), "BDE", "EFG", "EFG"))
# Installing tidyverse
install.packages("tidyverse")
# Loading tidyverse
library(tidyverse)
# For each category_name2 the category_name is summed
data %>%
group_by(category_name2) %>%
summarise(sum_by_group = sum(category_name))
# Output
category_name2 sum_by_group
ABC 15
BDE 6
EFG 15

R - Filter any rows and show all columns

I would like to an output that shows the column names that has rows containing a string value. Assume the following...
Animals Sex
I like Dogs Male
I like Cats Male
I like Dogs Female
I like Dogs Female
Data Missing Male
Data Missing Male
I found an SO tread here, David Arenburg provided answer which works very well but I was wondering if it is possible to get an output that doesn't show all the rows. So If I want to find a string "Data Missing" the output I would like to see is...
Animals
Data Missing
or
Animal
TRUE
instead of
Anmials Sex
Data Missing Male
Data Missing Male
I have also found using filters such as df$columnName works but I have big file and a number of large quantity of column names, typing column names would be tedious. Assume string "Data Missing" is also in other columns and there could be different type of strings. So that is why I like David Arenburg's answer, so bear in mind I don't have two columns, as sample given above.
Cheers
One thing you could do is grep for "Data Missing" like this:
x <- apply(data, 2, grep, pattern = "Data Missing")
lapply(x, length) > 1
This will give you the:
Animal
TRUE
result you're after. It's also good because it checks all columns, which you mentioned was something you wanted.
If we want only the first row where it matches, use match
data[match("Data Missing", data$Animals), "Animals", drop = FALSE]
# Animals
#5 Data Missing

How to convert dataframe to list of lists?

I have a dataset that looks like a sentiment dictionary.
Say the data is like following:
sentiment <-read.table(header=TRUE,text='
term score
awesome 3
good 0
interesting 2
power 1
bad -1
horrible -2
worst -3' )
I want to convert this data into a list of lists by score. Something like following (list names can be different):
$pos3
[1] awesome
$pos2
[1] interesting
$pos1
[1] power
$zero
[1] good
$neg1
[1] bad
$neg2
[1] horrible
$neg3
[1] worst
Other solutions suggested to use dlply:
dlply(sentiment, .(score), c)
But this produces score lists that I don't want. So I ended up using a dumb code like following:
list(neg3=sentiment$term[sentiment$score==-3],neg2=sentiment$term[sentiment$score==-2],
neg1=sentiment$term[sentiment$score==-1],zero=sentiment$term[sentiment$score==0],
pos1=sentiment$term[sentiment$score==1], pos2=sentiment$term[sentiment$score==2],
pos3=sentiment$term[sentiment$score==3])
But I wonder if there is a better way to do this.
How can I convert a dataframe to a list of lists without producing lists that I don't want?

Creating a vector from a file in R

I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.
words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.
Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4

Transforming character strings in R

I have to merge to data frames in R. The two data frames share a common id variable, the name of the subject. However, the names in one data frame are partly capitalized, while in the other they are in lower cases. Furthermore the names appear in reverse order. Here is a sample from the data frames:
DataFrame1$Name:
"Van Brempt Kathleen"
"Gräßle Ingeborg"
"Gauzès Jean-Paul"
"Winkler Iuliu"
DataFrame2$Name:
"Kathleen VAN BREMPT"
"Ingeborg GRÄSSLE"
"Jean-Paul GAUZÈS"
"Iuliu WINKLER"
Is there a way in R to make these two variables usable as an identifier for merging the data frames?
Best, Thomas
You can use gsub to convert the names around:
> names
[1] "Kathleen VAN BREMPT" "jean-paul GAULTIER"
> gsub("([^\\s]*)\\s(.*)","\\2 \\1",names,perl=TRUE)
[1] "VAN BREMPT Kathleen" "GAULTIER jean-paul"
>
This works by matching first anything up to the first space and then anything after that, and switching them around. Then add tolower() or toupper() if you want, and use match() for joining your data frames.
Good luck matching Grassle with Graßle though. Lots of other things will probably bite you too, such as people with two first names separated by space, or someone listed with a title!
Barry
Here's a complete solution that combines the two partial methods offered so far (and overcomes the fears expressed by Spacedman about "matching Grassle with Graßle"):
DataFrame2$revname <- gsub("([^\\s]*)\\s(.*)","\\2 \\1",DataFrame2$Name,perl=TRUE)
DataFrame2$agnum <-sapply(tolower(DataFrame2$revname), agrep, tolower(DataFrame1$Name) )
DataFrame1$num <-1:nrow(DataFrame1)
merge(DataFrame1, DataFrame2, by.x="num", by.y="agnum")
Output:
num Name.x Name.y revname
1 1 Van Brempt Kathleen Kathleen VAN BREMPT VAN BREMPT Kathleen
2 2 Gräßle Ingeborg Ingeborg GRÄSSLE GRÄSSLE Ingeborg
3 3 Gauzès Jean-Paul Jean-Paul GAUZÈS GAUZÈS Jean-Paul
4 4 Winkler Iuliu Iuliu WINKLER WINKLER Iuliu
The third step would not be necessary if DatFrame1 had rownames that were still sequentially numbered (as they would be by default). The merge statement would then be:
merge(DataFrame1, DataFrame2, by.x="row.names", by.y="agnum")
--
David.
Can you add an additional column/variable to each data frame which is a lowercase version of the original name:
DataFrame1$NameLower <- tolower(DataFrame1$Name)
DataFrame2$NameLower <- tolower(DataFrame2$Name)
Then perform a merge on this:
MergedDataFrame <- merge(DataFrame1, DataFrame2, by="NameLower")
In addition to the answer using gsub to rearrange the names, you might want to also look at the agrep function, this looks for approximate matches. You can use this with sapply to find the matching rows from one data frame to the other, e.g.:
> sapply( c('newyork', 'NEWJersey', 'Vormont'), agrep, x=state.name, ignore.case=TRUE )
newyork NEWJersey Vormont
32 30 45

Resources