-
Recent Posts
Vellum Information
Archives
Categories
Vellum Information: What to read
-
Join 20 other subscribers
Category Archives: Data Science
Big Data Analysis Tools/Resources: An Annotated Bibliography
Big Data Analysis Tools/Resources 1. Chris Stucchio’s blog http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html This blog makes this list primarily for one article: “Don’t use Hadoop – your data isn’t that big”, which provides a guide for deciding whether your data really qualifies as big … Continue reading
‘Big Data’ Public Databases: An Annotated Bibliography
Big Public Databases 1. Kin Lane’s Federal Dataset Tool http://federal-agency-dataset-adoption.publicprivatesector.org/index.html Many of the following listings refer to US Federal Government datasets. These are some of the biggest public datasets available. Unfortunately, much of this data is messy, published without much … Continue reading
Posted in Data Science, Digital Libraries, Information Technology, Resource-a-rama
Tagged Open data, social scientists
3 Comments
Future Works
I’m really looking forward to getting my new desktop computer. My wife and I got a newer system for Xmas (in the Futurama tradition for holiday names), and I plan on crunching more of the World Bank data into KML. … Continue reading
Association Rules and Data Mining With RapidMiner
A good association rule set never fails to impress me. I love the hypothetical made concrete, the hunch turned into fact – attributes become relationships, numbers become involvement between tuples, fields, and tables. All in all, we live in interesting … Continue reading
Posted in Data Science, Education, Information Technology
Tagged Association rule learning, datamining, Graphical user interface, IBM, Java, Math, RapidMiner, WEKA
Leave a comment
Data Mining for the Masses and Correlation Matrices
I’m working through Data Mining for the Masses (yes, at the same time as I’m working through Machine Learning for R.) I’ve found that hitting the same topic from multiple angles helps to embed the concepts and lessons much more … Continue reading
Posted in Data Science, Information Technology
Tagged data mining, databases, IBM, Machine Learning, RapidMiner, SAP, SAP AG, SPSS
Leave a comment
Data Analysis With R
I’ve been working my way through the Machine Learning for Hackers book from O’Reilly press (which really should be named R for Machine Learning), and just finished a small data analysis project in R. While the syntax is a little … Continue reading
Jaro-Winkler in ORACLE and textual fuzzy matching
There is a little-known (and hence heavily under-utilized) function in Oracle 11g and up. This is the Jaro-Winkler algorithm (and the companion algorithm named Edit Distance). The Jaro-Winkler algorithm tells you what level of permutation would be necessary in order … Continue reading →