Must have R – Rattle! Get your own corpus and construct a sentiment score

Keller School of Management BIAM 510 Week 7 lab – an example is attached but must have excel and R rattle. 2 Text files are also attached.

You’re going to go out and get your own corpus, your own direction of documents.

You’re going to import it to Rattle and generate a Term Document Matrix, export it to a .CSV file, import it into Excel, and then you’re going to construct a sentiment score. You’re also- you’re going to be asked to update or upgrade, if you will, the words that we used in that table.

For the corpus that we’re talking about here, what we want you to find at least three text files to use here. They can be something you want out of your own records. You could go on the internet and what is exactly what I did for the other two, find something about an announcement or something of interest to you about a company or a set of companies, and just do a copy and paste operation into a text file there.

So whatever sort of combination of things you want, do at least three documents of some length. They don’t have to be 15 or 20 pages or anything like that, but something that’s worth, sort of, taking an analysis, and doing an analysis of. And then create a directory in your working directory to put those files, because Rattle’s going to require that, you know, you’ve pointed to a directory with a set of text files in it.

Then import that into Rattle, and create your term Camera Document Matrix and then take a look at the results and make- see that, you know, that stemming has not been done- or maybe it has. Maybe you can check that stemming box on your particular installation. Do a’ stop’- and, you know, you should see that the stop words were eliminated.

And then, finally, add a column in that first column section there, so that you get a- you can make the change and put in the names of the document. So you sort of know something about what each row- does contain. Then export that to a .CSV file and, like we did earlier in the videos, import it using the Data Import Wizard, into Excel.

So then you’re ready to start constructing a sentiment score. And what we want you to do in this particular section- you’re going to do like what we did before. You’re going to add a weights row above the input that you just did, and also some scoring columns on the end, with the appropriate formulas to calculate the scores.

So you can use the previous video to do that. Also include a test table, so you sort of make sure, you know, you have sort of the right things there. And you can put in, say, five, or six, or ten words, as a test, to make sure it’s calculating the right weights for the look-up table.

Then, what we’re going to ask you to do is, improve on the word table, and, again, some additional muscle to the walking skeleton here. And we’re going to ask you to do an’ external lexicon.’ And if you were to go out and Google and query” Sentiment analysis” or” Opinion lexicon” or something like that, you would get a number of hits and various people’s- as we talked about before- various people’s sets of words that they consider to be negative and positive.

And we have chosen, just for this one, something from- we call the” Opinion lexicon,” and this is the citation for the first paper that it was in. And it’s actually- we’ve included it in your- in Doc Sharing, entitled” opinion lexicon.zip” for the Doc Sharing for Week 7 materials. And what it looks like is two text files in there, and a zip file.

One is a positive words file and one is a negative words file. You need to open those up in WordPad; not Notepad but WordPad. And when you do that, you’ll see that there is a set of documentation at the front of each of those text files. Delete both of those sets and then combine them into a single text file, so they made the positive words on top, negative words on the bottom.

Then what you’re going to do is, you’re going to add it to the Excel file workbook that we have using the Text Import Wizard as well, there. And so note that this is going to be the words in the look-up table. And what you want to do is- and make sure that you’re careful where you put them, so you don’t overwrite something that’s already there, in terms of your words.

And in fact, one of the things you might think of doing is actually putting this in a separate worksheet, and build your work table there, and changing the formulas appropriately across the top of that Term Document Matrix now you have in Excel. That way, if you were to use other kinds of corpus, you know, and other sets of documents or even add to it, you wouldn’t have to be moving that around, or doing anything with it.

You would just be changing the look-up as well there; so, something to think about. You don’t have to do that, but it’s just something, you know, you can think about there. Okay. The other thing is, we want you to add the weights there for the- plus one for the positive words and minus one for the negative.

You should be able to see, you know, there’s going to be a whole set of minus ones, and a whole set of plus ones, all sort of together there. So make sure you’ve got whichever you put on top. You have the right to sign for the weights. And then you’re going to need to revise the look-up functions.

The addresses there are going to have to be revised, because now you’re going to have a larger look-up table.Do a couple of tests as well before you do anything, look at the end, and, finally, when you’re done with that, you can look up the scores and see how the scores are doing across the documents.

And you should have some sort of feel for the documents, having at least read them through.And, you know, see if you think that that is a reasonable score for the document, or not a reasonable score for the document.There’s really no right answer here. You know, it all depends on, you know, how well these particular words, positive and negative, are actually, you know- fit your particular situation.

And it also, you know, indicates, you know, whether or not just a minus one, plus one type of waiting system works as well. So, you know, you could actually think about, you know, if it’s way off the chart, you know, again, another improvement might be to go back in and change the weights; make it from minus five to plus five, and see if that changes anything.

But there are all sorts of ways of developing or modeling this particular sentiment analysis that you have in terms of options. But then write in a separate worksheet. Just go to an additional sum worksheet, write a summary of what you found, and include what you might do next; add some more muscle to the skeleton.