Over the last 6 months, I have been working intensely with a host of data mining software. Some of it was good, some of it was lousy, and some of it I can only rate as excellent. You will need to see my later posts in order to get a view of some of my results, but the software itself deserves a bit of praise.
RapidMiner 5.1 is probably the crowning jewel of the software that I worked with. Visually appealing and fairly simple, my data mining was largely done with this tool. I still work with it, and love to delve into problems and data sets using the built-in algorithmic learning tools. I have to say that the Web scraping combined with the clustering and Naive Bayes algorithms can pull some great results out of nearly any dataset. I do, on the other hand, need a stronger processor.
Weka 3.6.4 gets an honorable mention for being some an awesome piece of software. I guess it doesn’t rate higher on my list of open-source goodness because it is included as a software suite that I can download with RapidMiner. It definitely rocked my boat, and is a great place to start learning data mining basics.
GATE is by far my #1 choice for text parsing. I was able to feed in an entire directory of text files and extract relevant material in only a minute or so. The drawback to GATE isn’t the system’s GUI, but more that it needs a little more documentation. I found myself going down dead ends trying to get things smoothed out. One thing that I love about GATE is that it can pull key words such as names and places – and the key words weren’t restricted by Anglophile methods. GATE’s learning methods pulled ‘Abdul’ and ‘Sheik’ as related constructs – nicely done!