Category Archives: Data Science

Big Data Analysis Tools/Resources: An Annotated Bibliography

Big Data Analysis Tools/Resources 1.     Chris Stucchio’s blog http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html This blog makes this list primarily for one article: “Don’t use Hadoop – your data isn’t that big”, which provides a guide for deciding whether your data really qualifies as big … Continue reading

Posted in Careers And Work, Data Science, Education, Information Technology, Resource-a-rama | Tagged , , | 2 Comments

‘Big Data’ Public Databases: An Annotated Bibliography

Big Public Databases 1.     Kin Lane’s Federal Dataset Tool http://federal-agency-dataset-adoption.publicprivatesector.org/index.html Many of the following listings refer to US Federal Government datasets. These are some of the biggest public datasets available. Unfortunately, much of this data is messy, published without much … Continue reading

Posted in Data Science, Digital Libraries, Information Technology, Resource-a-rama | Tagged , | 3 Comments

Jaro-Winkler in ORACLE and textual fuzzy matching

There is a little-known (and hence heavily under-utilized) function in Oracle 11g and up. This is the Jaro-Winkler algorithm (and the companion algorithm named Edit Distance). The Jaro-Winkler algorithm tells you what level of permutation would be necessary in order … Continue reading

Posted in Data Science, Information Technology, The Cloud, Wordplay and Commentary | Leave a comment

The Yin-Yang of Understanding Data

There are several issues with data. One is that it’s viewed with suspicion. Conversely, it is also held to sacrosanct integrity. I’d almost refer to this as the yin-yang of data understanding. When I come to findings or conclusions with … Continue reading

Posted in Careers And Work, Data Science, Information Technology, Resource-a-rama | Tagged , , , , , , , | Leave a comment

Future Works

I’m really looking forward to getting my new desktop computer. My wife and I got a newer system for Xmas (in the Futurama tradition for holiday names), and I plan on crunching more of the World Bank data into KML. … Continue reading

Posted in Data Science, Information Technology, Resource-a-rama, Wordplay and Commentary | Tagged , , , , , , , , | Leave a comment

Association Rules and Data Mining With RapidMiner

A good association rule set never fails to impress me. I love the hypothetical made concrete, the hunch turned into fact – attributes become relationships, numbers become involvement between tuples, fields, and tables. All in all, we live in interesting … Continue reading

Posted in Data Science, Education, Information Technology | Tagged , , , , , , , | Leave a comment

Data Mining for the Masses and Correlation Matrices

I’m working through Data Mining for the Masses (yes, at the same time as I’m working through Machine Learning for R.) I’ve found that hitting the same topic from multiple angles helps to embed the concepts and lessons much more … Continue reading

Posted in Data Science, Information Technology | Tagged , , , , , , , | Leave a comment

Data Analysis With R

I’ve been working my way through the Machine Learning for Hackers book from O’Reilly press (which really should be named R for Machine Learning), and just finished a small data analysis project in R. While the syntax is a little … Continue reading

Posted in Data Science, Education, Information Technology, Resource-a-rama | Tagged , , , , , | 1 Comment