Association Rules and Data Mining With RapidMiner

A good association rule set never fails to impress me. I love the hypothetical made concrete, the hunch turned into fact – attributes become relationships, numbers become involvement between tuples, fields, and tables. All in all, we live in interesting times, and making sense of all of this allows us as people to continue achieving, pushing ahead. Up, up, and beyond.

Here is an Association Graph image made with RapidMiner that shows associations between various social groups in a community. While the data is/was (probably) fictitious, the connections are plausible and viable. These associations are the type of causal relationship that humans identify easily, but machines used to have a difficult time identifying. Given this, I would offer a maxim: The truth is in the algorithm.

Image

Posted in Uncategorized | Leave a comment

Data Mining for the Masses and Correlation Matrices

I’m working through Data Mining for the Masses (yes, at the same time as I’m working through Machine Learning for R.) I’ve found that hitting the same topic from multiple angles helps to embed the concepts and lessons much more firmly. Some people would say that approaching R and RapidMiner at the same time is foolhardy, but I actually think it is vital to learning in depth about data science in general. A critical facet of data science is that it is tool-heavy, with players such as SPSS, SAS, SAP, Oracle, and IBM all fighting over the same data real estate. I’ve always been prone to using long-term, supported software packages that are free (hence, R and RapidMiner), but feel that these skills translate well across playing fields – a part of data analytics is the foundation and skills involved, which evolves into the universal concepts of statistics, probability, and mathematics.

To elucidate my current lesson, there is a 1,400 item housing data set that I’ve generated a correlation matrix for (i.e. Data Mining for the Masses, Chapter 4.) I’m always impressed with the alacrity that RapidMiner generates these graphics and tables. the YALE project did well, all things considered.

Image

 

Posted in Data Science, Information Technology | Tagged , , , , , , , | Leave a comment

Data Analysis With R

I’ve been working my way through the Machine Learning for Hackers book from O’Reilly press (which really should be named R for Machine Learning), and just finished a small data analysis project in R. While the syntax is a little awkward, the power under the hood of R is fantastic. ’nuff said.

Anyways, I’d recommend the book, with the caveat that you’re going to need to reference the github server for both data and clarity regarding programming points: For example, Infochimps has a flawed security certificate (making downloading data sets dodgy), and there were enough coding errors in the book’s first chapter to cause me to develop 2-3 hours worth of workarounds (and headaches.) Not on par with the Head First HTML/CSS book, that’s for sure – Machine Learning seems hastily put together, but worthwhile for the knowledge core.

Anyways, here’s the project output, an analysis produced in R of roughly 45,000 UFO sightings from 1990-2010, by US state, month, and year.

UFO Sightings

45,000 UFO sightings by state, year, and month.

Posted in Data Science, Education, Information Technology, Resource-a-rama | Tagged , , , , , | 1 Comment

The Penn Data Store and Medical Data Integration

Here is the second poster presentation from the NEDB 2013 conference at MIT. The conference was on Feb. 1, 2013, and was a boat-load of fun.

ABSTRACT:

As a premier research institution, the University of Pennsylvania harbors numerous databases: these run the gamut from clinical, research, and financial, to genomic and neurological. Integrating all these disparate data sources has become a massive endeavor at the University’s Medical System. In order to accommodate the organization’s vast needs for data, and to assist accomplishing the objective of become a data-driven enterprise, Penn built the Penn Data Store clinical data warehouse, one of 12 CCHIT-certified Data Warehouses within the USA designated for clinical research use. This warehouse is an ongoing project: it currently incorporates inpatient and outpatient data from eleven distinctly different medical record systems, and is constantly in the process of assimilating even more content from Penn’s medical databases in order to directly contribute to medical research.

Posted in Careers And Work, Education, Genetics, Information Technology, Resource-a-rama | Tagged , , , , , , , , , , , | Leave a comment

The Future of Health Data

Here is the poster that I presented at the New England Database Conference 2013, held over at MIT‘s Stata Center in Cambridge, MA. I’ve also excised the abstract for ease of reading (and so that Google will zero in on this page.)

ABSTRACT: Medicine is already becoming more dependent on the medical data contained within the medical record systems mandated by the Recovery Act. Doctors and medical practitioners have begun to shift into a data-based decision-making paradigm. The next great leap that data-driven medicine will take will involve assimilation of proteomic- and genomic-level databases into the medical record system itself. This will enable three key functions: the foundation of data mining/predictive modeling, better patient care via greater depth of knowledge, and the ability to tailor gene-based or protein-specific treatments to the patient. All of this requires massive databases and data-driven enterprises: it can be argued that medicine is becoming a true data-focused field.

Posted in Careers And Work, Education, Genetics, Information Technology, Resource-a-rama | Tagged , , , , , , , , , , , | 1 Comment

The Boggan’s Market: Adventure Paths, Vol. I

The Boggan’s Market: Adventure Paths, Vol. I

http://amzn.com/B0096RXU0U

Come find your way to adventure in this fun book – choose your own path through the mysterious market at the edge of the forest. Explore another world with this interactive book of fantasy and magic. Fight ogres, meet gypsies, and achieve your dreams!

For something completely different, check out the choose your own adventure book I wrote. Good fun!Image

Aside | Posted on by | Leave a comment

Don’t Get Lazy, and Learn for Life

I am enrolled in Drexel‘s MSIS, and have only a few classes to go. Currently, I’m finishing up my pre-req courses to finish the Masters of Science in Information Systems (MSIS). Before this, I completed my MSLIS (MS Library and Information Science), also at Drexel.

Both programs I completed online; part of my work’s benefits includes a hearty stipend for education. Working full-time doesn’t leave a lot of time for attending courses at the actual campus, even though theoretically I’m about 3 blocks from the iSchool‘s building. There isn’t anything more or less challenging about online graduate school, except that the coursework needs to be attended to on a different schedule, and the connections that I’ve made in the online programs are perhaps a little less solid. There’s something to be said for face to face contact with your professional and academic colleagues.

Drexel’s iSchool is highly rated (#3 in IS, #9 in LIS), for what it’s worth. The school is in the Top 100 for national universities, and it is also a major research institution. Compared to the University of Pennsylvania, there’s a distinct lack of that Ivy League absolute passion – I work with Penn grads, and am married to a Penn grad. I can tell you that Penn deserves it’s #5 ranking. Drexel graduate students are excellent, but I’m unsure of the undergrads; within both of the MSIS and MSLIS programs, students have pushed and been pushed to succeed.

Drexel’s reputation is definitely oriented towards IT, IS, and Comp. Sci. When I applied to graduate school the first time, I looked at six programs – Wisconsin, Drexel, and a few others. Part of my goal was to find a quality distance program. The hard part isn’t getting into a program, but finishing it. If you’re applying to graduate school, find out the matriculation rates for the program you’re thinking about entering into – that’s a helpful piece of advice. Drexel accepts about 20% of the MSIS applicants for their Master’s program, and (as of a few years ago), had about 400 applicants a year. Thus, they have an annual class of 80 for the MSIS. Within most PhD programs, things are much more selective – I know that the University of Washington put their numbers out there for the entering classes for both UW’s MSLIS and PhD programs a few years back. I’m always vaguely curious about the snapshots of scholastic competitiveness.

What can you do with a MSIS? Well, Drexel’s program includes a chunk of management courses (budgeting, software analysis, etc.) I’m looking into developing my career more towards data modeling and application architecture, for which the MSIS is well suited – and since my work will continue to pay for education, I’m also going to explore Syracuse University and Boston University’s certificate programs.

“Don’t get lazy” is a good motto to live by. Syracuse has a Data Mining certificate, and Boston has an Advanced Databases certificate – both are just for intellectual advancement: “Learn for life” is another good mental dictum. I get immensely frustrated with professors and their academic gibberish, but any good university still retains faculty with a breadth of knowledge about their domain(s). Wading through the academic muck is annoying, but the opportunity to learn and achieve is paramount.

Posted in Education, Information Technology, Wordplay and Commentary | Tagged , , , , , , , | Leave a comment