Database Training Resources: An Annotated Bibliography

Database Training Resources

1.     Coursera/Stanford “Introduction to Databases”

https://www.coursera.org/course/db

Introduction to Databases from Stanford University was one of the first Massive Open Online Courses (MOOCs) offered by Coursera in 2011, and has remained consistently popular. The course covers database design and the use of database management systems for applications. The course begins with the fundamental theory of database design, including the relational model and SQL. It moves on to cover contemporary issues in database management including JSON and NoSQL systems.  The course uses PostgreSQL, SQLite, and MySQL. The course is comprised of video lectures, assignments, and exams. Discussion forums and the possibility of local meet-ups support learning.

Keywords: MOOCs, Stanford, database, SQL, NoSQL, PostgreSQL, SQLite, and MySQL

Audience: beginner – intermediate database users

2.     Coursera/University of Washington “Introduction to Data Science”

https://www.coursera.org/course/datasci

This MOOC is called “Introduction to Data Science” but the first of its two major units is devoted to databases. This includes an introduction to MapReduce, and Hadoop, as well as an SQL programming assignment. The course is comprised of the same materials as “Introduction to Databases” above, and has the same support system. As with “Introduction to Databases”, the added benefit of completing “Introduction to Data Science”, over and above the learning, is the opportunity to earn a formal certificate of recognition of the knowledge acquired, which may be useful to those hoping to apply their knowledge of databases professionally.

Keywords: MOOC, University of Washington, video lectures, assignments, exams, peer-support

Audience: beginner-intermediate database users

Continue reading

Posted in Careers And Work, Education, Information Technology, Resource-a-rama | Tagged , | Leave a comment

Data Visualization Resources: An Annotated Bibliography

Data Visualization Resources

1.     The St Louis Federal Reserve

http://research.stlouisfed.org/

The St Louis Federal Reserve Economic Data series is perhaps the most comprehensive repository of time-series data. It also offers an in-browser, cross-platform, data visualization tool. The time series are collected from a huge variety of US government sources, as well as a number of international organizations such as the OECD and World Bank. The FRED tools, including charts, graphs and maps, are extremely simple and user-friendly. They lack the flashy design of other data visualization kits, but preserve a consistent and legible style across all platforms.

Keywords: Bureau of Labor Statistics

Audience: economists, political scientists, advocacy groups, journalists

2.     Google Charts

https://developers.google.com/chart

Google Charts is a simple browser-based data visualization utility, specifically designed for web functions, including data sourcing and display. Website display of the charts is extremely durable, making Google Charts a good choice for projects where browser-compatibility is a priority.  Google Charts is tightly integrated with Google Spreadsheets, including dynamic updating of the chart when the source data changes. The statistical processing available in Charts is basic, making Charts a poor choice for complex analyses. The styling of the charts is basic, in keeping with Google’s minimalist aesthetic, and not CSS customizable.

Keywords: Google, html

Audience: students, teachers, advocacy groups

Continue reading

Posted in Careers And Work, Data Visualization, Information Technology, Resource-a-rama | Tagged | Leave a comment

Game Design and Choices of Creation

Concerning my self-imposed goals to write and produce games: Human decision-making is largely fueled by seeking out novel/familiar stimuli, as well as avoiding previous pain points and repeating pleasurable experiences. We will need to keep this paradigm in mind: People will avoid previously painful experiences, repeat pleasurable ones, and are dually pushed and pulled by existing behavior patterns and the opportunities to have new life experiences. Maybe my decision-making and goal-setting is just fueled by the thought that I haven’t had painful experiences, or perhaps my threshold for cerebral adventure is higher than most people.

Writing is a private event made public: when you write, you’re putting your words into context for an audience.  Game-creation is also a private choice that you can make public, very much in the same manner as a writer publishes a book. I’ve got one game set fairly finished, and another game is in production and prep for the first prototype printing. These started as ideas, and had to be fostered into reality, brought forth one conceptualized structure at a time until the framework was present – only then the cards could be developed and put into print. From there, many other iterations and changes have to take place before a finished product can be sold.

As it is, I will probably be setting my first game into a IndieGogo or Kickstarter in order to raise funds (kind of like a pre-release of the game, but without a corporate sponsor). Afterwards, I’ll have more time to work on the second game in development. When you create something of lasting value, it is like an errant child – sometimes it circles back and you realize what you could have done differently. That’s part of the sacrifice of releasing your creations into the world – once you’ve let go, stop grasping.

Posted in Gamecraft, Resource-a-rama, Wordplay and Commentary | Leave a comment

Jaro-Winkler in ORACLE and textual fuzzy matching

There is a little-known (and hence heavily under-utilized) function in Oracle 11g and up. This is the Jaro-Winkler algorithm (and the companion algorithm named Edit Distance). The Jaro-Winkler algorithm tells you what level of permutation would be necessary in order to transform ‘String A’ into ‘String B’.

You can find the official Oracle documentation here. I implemented it using the BUILT_IN Oracle function UTL_MATCH, which is used with SQL code similar to:

SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY(‘shackleford’, ‘shackelford’) FROM DUAL;

A vitally important feature of the Text Similarity function is that it allows you to measure difference with both normalized (0-1) and scalar (0-100) measures. By close examination, you can see the levels of difference involved with different string permutations. I used it to match diagnoses from Medicare CMS data to our internal data, but the function is versatile and not confined to any specific application (any text will work).

Note: strings starting with ‘0’ cannot compare to strings not starting with ‘0’ within the Jaro-Winkler function, but can compare with Edit Distance. This was an intuitive find that I spotted, but that isn’t defined in the literature anywhere.

Example: DX: ‘0100’ compared to  DX: ‘100’ will return about a 95 with Edit Distance, but a 0 with Jaro-Winkler.

Posted in Data Science, Information Technology, The Cloud, Wordplay and Commentary | Leave a comment

Working in Data: ORDER CAREERS, DESC

I’ve had the opportunity to work in health data for a while now. There are most definitely gradations and ranks in the data verse career list. In my first role as a health technologist, I learned a massive wealth of experience and practical knowledge about what works and how to achieve small (and large) goals.

Working at smaller health care centers, often somebody will need to do double-duty in the clinic. For my work at Ke Ola Mamo in Honolulu, I not only built out the Cognos engine for our analytics, but digitized paper records, helped developed workflows for case managers, and customized the electronic medical record system to fit the clinic needs (probably saving the organization $20,000 or so in consultant fees).

In addition, I mapped the clinic population to KML for Google Earth and build a reporting engine to extract system data and deliver staff productivity reports to management. Importantly, keep in mind this was my first job in healthcare analytics.

Had this been a larger health system, my role would have been much more constricted and specific. The breadth of the work role was mandated by the lack oF IT staff and the available budget, not necessarily the original job description. When I saw an opportunity to develop or experiment with data in one of its permutations, I grasped it (hence the reporting engine and KML analysis). For this, I’ve got three points of advice: 1. I acted first, 2. My typical method was that I informed management of my end goals rather than processes, and 3. I understood that not all my development efforts would be rewarded equally.

Indeed, the prime element about #3 is crucial: not everything will succeed. Often multiple incarnations of a project are required to have a polished result – Rome wasn’t built in a day. Undertaking projects involves being able to spend the concentrated time and focus until you’ve got working results. The world is big on people who dream big, but fruitful action is in short supply.

 

Posted in Careers And Work, Digital Libraries, Information Technology | Tagged , , , , , , , | Leave a comment

Game Theory and Game Design

I love making games -specifically, card games. Card games are a good development tool for so many reasons: they’re concise, simple, known.

I’ve got two games in production – one is a variant of the classic War card game, and the second is a game loosely based around the genre known as (very) short adventures. I guess one focus of the games is that they’re playable in 30-45 minutes or less – as a matter of fact, the War card game is playable in 10-20 minutes.

Another critical feature of the games I create is that they’re modular – components can be swapped. This involves more work up front, but the result is a game with more versatility. The goal is that people love the games and want to replay them to pass time. I think that’s why cribbage is such a long-lasting staple of card games – it’s compact but has flexibility and complexity.

Finding the correct level of complexity in a game is a fine line – there’s always room to make the system more complex – keeping in mind that complexity immediately impacts how quickly and simply a game can be learned and played. Also, it’s harder to simplify a game than add complexity – this is almost a universal maxim in design (not just game design, but UX and other fields as well).

With the Time Warp (War variant) card game, somebody should be able to learn to play within 5 minutes, and finish the first game within 15 minutes (20 minutes total). With the adventure game, the game is a touch more complex, but learnable within 15 minutes and playable in 45 minutes – 1 hour total.

Short play times are crucial to my happiness as a designer (and a player!) nobody wants to be bogged down in hours of ridiculous, boring game theory and useless activity. Quick, simple, and cheap – this is the winning combo for game design (or any other product).

 

Posted in Resource-a-rama | Tagged , , , , , | Leave a comment

The Yin-Yang of Understanding Data

There are several issues with data. One is that it’s viewed with suspicion. Conversely, it is also held to sacrosanct integrity. I’d almost refer to this as the yin-yang of data understanding.

When I come to findings or conclusions with healthcare data, people often refuse to give the data due credence. Largely this stems from political roots or an ingrained sense of self-knowledge (wherein the data assessor believes their own anecdotal evidence over the data itself). This is the yin portion of data perception.

Within the other schema of understanding data is an over reliance on the data analysis to validate or make decisions. I had a supervisor who was a subsidiary information officer. One of his favorite quotes was ‘what gets measured, gets managed’. I’m sure this actually comes from some corner of the business world, but don’t know the particular source of origin.

Another rather bright fellow always made sure to explain these concepts with a caveat: if you mis-measure, you’ll mismanage. For example, in hospitals a key metric is room utilization and efficiency. Not all departments or surgeons are as efficient, and finding a key performance level for their work was crucial to retaining top physicians and ensuring their compensation was fair. Laying down a blanket 50% metric would have been grossly unfair to a vast majority of doctors, while still eliciting protest from the bottom two quartiles. Clearly, there needs to be a better way to manage efficiency and performance at all levels. One key complaint I’ve heard is that companies lose their crucial employees by not realizing what they contributed – another classic example of mis-measurement. The work wasn’t accounted for, but still was being done.

In the these latter cases, the classic decision-making model was supplemented by data, but the distinct possibility of faulty or misguided data analysis made wrong decisions not just likely, but almost certain. The human element of error was compounded by the data.

 

Posted in Careers And Work, Data Science, Information Technology, Resource-a-rama | Tagged , , , , , , , | Leave a comment

Future Works

I’m really looking forward to getting my new desktop computer. My wife and I got a newer system for Xmas (in the Futurama tradition for holiday names), and I plan on crunching more of the World Bank data into KML.

One issue with KML transformation is that it was performed on a single desktop over a few weeks. While data creation
and transmogrification isn’t necessarily that heavy processing-wise, the sheer number of individual disparate data sets generated from the World Bank master economic data set was overwhelming. Still, it reminds me of how I accomplish most of my tasks – make each individual step into a bite-sized morsel, and then chew away.

So, in the next few weeks, expect to see more World Bank KML data (as my actual daily workload permits) and a new card game synopsis. I’ve got some decidedly non-technical creations in the pipeline, and hope that you like them!

Lastly, I’m thinking about pursuing some technical writing about RapidMiner 6.x (which unfortunately has become a commercial product with open-source roots). If you’d care to leave a comment about what type of technical documentation would be most helpful with predictive modeling software, I would be grateful – this would let me get closer to the user base and produce works with more vitality.

 

Posted in Data Science, Information Technology, Resource-a-rama, Wordplay and Commentary | Tagged , , , , , , , , | Leave a comment

Hot Jobs in Tech: Technology As An Individual Economy

First off, I wanted to share this infographic created by KForce, one of the staffing agencies that handles large corporate accounts. Take a moment and look it over. You’ll probably notice that architect positions and mobile developers are in high demand, which is supported by the high wages.

Hot Jobs in Tech

Hot Jobs in Tech

While the location matters to a certain degree, the key is that these are the top tech skill areas. Even without these specific skill sets, people in tech earn an average of roughly $88,000 a year (BLS). With the average American household earning a median of roughly $50,000 annually (Wikipedia), it becomes apparent that tech is a way out of the conundrum of low or stagnate wages.

Another critical element to consider is that of unemployment. It could be ventured that technology (and perhaps STEM in general) forms a kind of mirror economy, one that reflects a different world for those whom live here. I’m speaking of the unemployment rate for IT staff. For college grads, the unemployment rate is roughly 4%. For skilled technology workers, the unemployment rate hovers around 1-2%, depending on the skill set and experience level.

Working in tech on predictive modeling projects, executive level summaries, and key indicator methods, I can see that the world I live in is different. The mirror economy of technology is one where recruiters contact you daily trying to fill job spots, and the level of vacant positions is 1:2. That’s right: there are two open positions for each person who can do the work properly. This is a primary reason that the technology world is a Bizarro employment/economic market. It’s a land of opportunity, even while the rest of the globe slowly lumbers under youth unemployment and stagnate economic growth.

 

Posted in Careers And Work, Information Technology, Resource-a-rama, Wordplay and Commentary | Tagged , , , , , , , , | Leave a comment

OEDB and Trends in the IT Field

I was recently contacted by Xavier Gray and his colleagues, whom requested that I write a bit about their online website (I wouldn’t truly call it a database, it’s more akin to a career/education guide). The website in question is OEDB.org, which stands for the Online Education Database.

If you’re looking for information about online schools, the OEDB is one resource to examine. It completely missed Drexel University, which is one of the top-ranked information systems programs in the USA currently, and includes a fair number of smaller Christian/sectarian schools, but I’m not worried about the lack of focus. As with all graduate school and college/university resources, you have to evaluate whom the target audience is. With the OEDB, I assume they are targeting people thinking about returning to school after a haitus, or perhaps somebody whom cannot otherwise attend a university.

To their credit, the designers of the OEDB have included the graduation rates, employment rates, and retention rates for a fair number of their listed/ranked institutions. While I’m not sure what methodology they use in determining rankings, the fact that the OEDB staff have wrangled these pieces out of the school administration is a success. I’m all for academic transparency, and think that it speaks well of an institution when they’re public with their data.

That caveat been made clear, a majority of schools are listing graduation rates below 40%. Why would you attend college at all if you’re going to not finish? The time and effort may be best spent involving yourself with something like EdX or Coursera, especially given the IT trends for greater needs and fewer graduates in the technical disciplines. Seeing the graduation percentages is frankly depressing.

Unless we can turn ourselves around as a country, we’re going to end up needing to import ever-growing segments of our IT labor, with the concurrent devaluing of skill levels and global marketability – with the research showing that H-1 labor is no more skilled than US college grads overall, there will continue to be a lack of talent in the field that process automation cannot overcome rapidly enough.

Posted in Education, Wordplay and Commentary | 1 Comment