Yahoo Pipes Mash-Ups and Future Projects

I have been experimenting with mash-ups, with the ultimate plan to incorporate them into mobile applications and websites. Yahoo Pipes offers an easy way to create mash-up systems. Pipes is primarily focused around RSS, JSON, XML, Flickr, and geocoded data on Yahoo Maps. Even with this limitation, the interface is slick and strongly reminds me of RapidMiner 5.1. As a matter of fact, someone must have been stealing ideas, but I have no idea which came first. Pipes does not include a data mining tool, but I am absolutely sure that I can incorporate Pipes input into RapidMiner 5.1 output. How fantastic would that be?

Go take a look around! There are several good tutorials available on the Pipes site, and also quite a few videos floating around on the Net. Once I take care of a few things, I will be posting the interactive map and RSS feed to the system. That will probably be in a few weeks, once I have a chance to fully plan out the rest of October and November’s techscapades.

http://pipes.yahoo.com/pipes/murakamibrundagetechfeed

Posted in Information Technology, Media Sharing, Resource-a-rama, The Cloud | Tagged , , , , , | Leave a comment

Data Mining with RapidMiner 5.1, GATE, and Weka 3.6.4

Over the last 6 months, I have been working intensely with a host of data mining software. Some of it was good, some of it was lousy, and some of it I can only rate as excellent. You will need to see my later posts in order to get a view of some of my results, but the software itself deserves a bit of praise.

RapidMiner 5.1 is probably the crowning jewel of the software that I worked with. Visually appealing and fairly simple, my data mining was largely done with this tool. I still work with it, and love to delve into problems and data sets using the built-in algorithmic learning tools. I have to say that the Web scraping combined with the clustering and Naive Bayes algorithms can pull some great results out of nearly any dataset. I do, on the other hand, need a stronger processor.

Weka 3.6.4 gets an honorable mention for being some an awesome piece of software. I guess it doesn’t rate higher on my list of open-source goodness because it is included as a software suite that I can download with RapidMiner. It definitely rocked my boat, and is a great place to start learning data mining basics.

GATE is by far my #1 choice for text parsing. I was able to feed in an entire directory of text files and extract relevant material in only a minute or so. The drawback to GATE isn’t the system’s GUI, but more that it needs a little more documentation. I found myself going down dead ends trying to get things smoothed out. One thing that I love about GATE is that it can pull key words such as names and places – and the key words weren’t restricted by Anglophile methods. GATE’s learning methods pulled ‘Abdul’ and ‘Sheik’ as related constructs – nicely done!

Posted in Information Technology, The Cloud, Wordplay and Commentary | Tagged , , , , , , , | 1 Comment

Sample Code for Oracle 10g: DECODE, applied

I was having fun with the DECODE command while playing around with Oracle 10g. For the note, the description of DECODE on Oracle’s website is more than a little bit like mud. I guess there is no easy way to explain a computer method, but I found this non-useful.

So, here we go. Say you have employees that you need to divvy up into categories by hours worked, and you want Oracle (or whatever) to automatically do it. You could use the following code:

SELECT EMPLOYEE.LNAME, EMPLOYEE.FNAME, WORKS_ON.HOURS,

DECODE(TRUNC(HOURS/10), 0, ‘D’,  1, ‘C’, 2, ‘B’, 3, ‘A’) CODE

FROM WORKS_ON, EMPLOYEE

WHERE EMPLOYEE.SSN = WORKS_ON.ESSN

So, what you have here is the DECODE in all the glory. It takes employee hours worked during the span (probably weekly) and divides the hours by 10 (truncated, in order to round down). This gives a result between 0-3.

DECODE then assigns a CODE using this method; the final result looks something similar to the following:

Smith John 32.5 A
Smith Grace 7.5 D
Ramesh Nariyan 40 A
English Joyce 20 B

You could use the DECODE for grading, sales, or pretty much anything that requires coded output. There is also a nifty little trick with DECODE that I discovered:

SELECT EMPLOYEE.LNAME, EMPLOYEE.FNAME, PROJECT.PNAME, WORKS_ON.HOURS,

DECODE(TRUNC(HOURS/10), 0, ‘D’) CODE_D,

DECODE(TRUNC(HOURS/10), 1, ‘C’) CODE_C,

DECODE(TRUNC(HOURS/10), 2, ‘B’) CODE_B,

DECODE(TRUNC(HOURS/10), 3, ‘A’) CODE_A

FROM WORKS_ON, EMPLOYEE

WHERE EMPLOYEE.SSN = WORKS_ON.ESSN

This DECODE yields a slightly different spin by coding in a matrix:

LNAME

FNAME

HOURS

CODE_D

CODE_C

CODE_B

CODE_A

Smith John 32.5  – A
Smith John 7.5 D
Ramesh Nariyan 40  – A
English Joyce 20  – B

Nicely divided into a matrix, for all those reporting needs.

DECODE can be made a little less obfuscated by practical examples. As one friend stated on Facebook: “I learned more math in Physics or Statistics than I ever did in Calculus.” Applied mathematics and applied coding is still the best way to learn.

Posted in Information Technology, Resource-a-rama | Tagged , , , , , , | Leave a comment

Installing a Paypal Button on WordPress

A popular topic is installing a Paypal button on WordPress, in the hopes that someone will like your blog and donate money to the worthy cause of self-publishing. While I don’t know if this ever properly works out (i.e. I know a lot of writers don’t necessarily get paid for their blogging work), it seems worth the effort.

Now, I don’t have a Paypal button, and it isn’t likely that you will see one anytime soon. This is mostly a personal choice, coupled with my belief that Donate buttons don’t draw in any real donations. My evidence for this is spawned from Facebook, where I see multiple causes with thousands of people who ‘Like’ something, but nary a dollar given to the subject.

Anyways: Installing a Paypal button is fairly straightforward if you know basic XHTML and can work some WordPress hackery. There are instructions on the WordPress Support pages here: http://en.support.wordpress.com/paypal/. These instructions are fairly straightforward, but it can help to have someone with a little savvy assist with it. I am not going to rehash the contents, just give a few pointers.

One good way to build the Button is to open a blank Post in WordPress. This will be your ‘workspace’, where you can build your Paypal button. Follow the instructions on the page.

Pay close attention to #6. You don’t want the Website code, you are going to link the Paypal Button to your email account. This is because WordPress will strip out any code that may be harmful, which includes the Paypal/Website code. After all, WordPress doesn’t know you are building a donation button. The WordPress.com system is set up to protect you and other users, which includes being strict when it comes allowing programming code on a blog.

Under instruction #9, highlight and copy the Button code (Keystroke: Ctrl-C, or Apple-C on a Mac) you like.

The Button code is the part of the webpage that reads

<img src="https://www.paypal.com/en_US/i/btn/btn_donate_LG.gif" alt="" />

Go into your ‘workspace’ Post. Select the word HTML (next to the word Visual, on the top right side of the Editor).  Paste in your Button code that you copied. Copy the link from the Paypal Email page (See #6 on the original instruction page).

There is a lot of programming stuff between #10 and #11 on the Support page. Someone must have been tired by the time they got to this point, because the gaps are huge for a novice WordPress blogger/designer. The first time I did this, I realized that half the Paypal button is covered in #1-9 of the Support Page, and the other half of the Paypal button design is covered by #10 and #11. Select the word ‘Visual’ on the ‘workspace’ Post, click on the Paypal Donate button, and select the Create Link (it is supposed to look like a chain, but actually looks like a pill). Paste your email link.

Save your ‘workspace’ post as its own post, label it something like ‘Paypal Button’, and save it as a draft. This is so you will have a back-up of your Button, as well as have a button already built. Think of it as your own code library, right there on WordPress.com.

Create a Text Widget and copy your HTML into the new Widget. You can get the HTML from your Paypal Button by selecting ‘HTML’ (next to Visual), and copying the entire paragraph of computer code into the Widget. Then you should be ready to go.

On the other hand, if you prefer for someone else to do it, then you can drop me a line. My rates are fairly reasonable, and I can get it done within a day. Not a plug, just an understanding that many people would prefer to have someone else solve a technical issue like this.

Posted in Information Technology, Resource-a-rama | Tagged , , , , , , | Leave a comment

Perspectives on Oracle 10g Express

I have been working with Oracle 10g (and mySQL 5.1/5.5), and have come to some interesting conclusions regarding the Oracle SQL, as well as PHP.

Oracle’s SQL language has its own variant – SQL*Plus. While I won’t get into the gritty details right here, is should be evident that Oracle doesn’t do things half-way. They have pretty much retro-fitted SQL in order to fit their specifications. Oracle’s SQL works and works well.

The great thing about 10g is that it presents with a pretty fantastic GUI system, kind of like Cognos and mySQL Workbench 5.2 all wrapped together in a smooth Oracle shell. I don’t mean to gush about the GUI, but I put it up there with Joomla!‘s user-friendly GUI for smoothness. As far as DBA stuff, that still needs to be learned, and no GUI will ever make the learning curve go down.

In the area of freely deployed database software, 10g Express (the system that I am using) is free of charge, and has all the standard 9i versatility. 10g Express has a limited footprint, so you may not want to roll it out as your production foundation. On the other hand, I don’t think that I have ever had a system reach 5GB in size – I am not sure what this says about the scope of my system designs. I will need to double-check my Joomla! sites, but that would be quite the website (Joomla! runs on mySQL anyways, but hey…)

It is doubtful that Oracle 10g will ever become the basis of numerous open-source systems, since the combination of marketing restrictions and the open-source databases (mySQL 5.5 and its ilk) present such intense competitive factors. That said, 10g is a great platform to learn on, and definitely presents an alternative to standard mySQL.

I guess that I should end on a note: mySQL is also an Oracle product, and there are other free database systems available. mySQL 5.5 was released in 2011, and mySQL is designated to remain open-source until 2015. After that, Oracle is likely to cinch down ever so tightly on their now-copywritten (non-open-source) work, with all the chaos that this would cause. When the time comes, be prepared to make the shift from mySQL, BerkeleyDB, or 10g to a cheaper alternative.

Posted in Information Technology, Resource-a-rama | Tagged , , , , , , , , | Leave a comment

Antipsychotic Costs, public buying power comparison, May 2010

Antipsychotic Costs, public buying power comparison, May 2010 Many Eyes This is an image of a visualization comparing costs of prescriptions (public buying power vs. private payer). It highlights the difference of having a mass purchasing program. For example, some of the medications are so cheap that they barely appear on the graph, such as risperidone (Risperdal). Comparatively, the cost at drugstore.com is $400/month.

The focus is to highlight what an uninsured person would pay for these prescriptions. It is obvious that the public buying pool is able to push down the cost of these expensive medications far beyond what a drugstore can sell them for (drugstore.com was used as a basis because no pharmacy data was available).

Ball-and-stick model of risperidone

Image via Wikipedia

When insurance enters the picture, the scenario changes to a market situation, rather than a buyer/seller basis. Modeling this data may happen in the future. Dates for price matching was May 2010. All data retrieved from drugstore.com and the Oregon Prescription Drug Program.

Posted in Information Technology, Resource-a-rama | Tagged , , , , | Leave a comment

Creativity and Information Technology

Charles Babbage's Difference Engine 2 at the S...

Image by Kevglobal via Flickr

I have two questions regarding computers, business, and creativity: 1) Can you learn to be creative? 2) Why is technology the fore-front of creative business? Before I start, I should preface this with the caveat that I like to create beautiful things, but my business acumen is startlingly non-existant. Thus, I am addressing only the creative aspects of technology, rather than the profit-making side.

At the surface, it would be immediately assumed that creativity is crucial to success in business. Steve Jobs, arguably one of the most successful men (twice-over, given that he founded Apple, left, and then forged the brand anew) recently gave a speech where he argued against market research. His statement, roughly paraphrased, is that “customers don’t know what they want.” His argument was against the tail-chasing market research that is so prevalent today. This is a common thread in creative matters. Don’t let someone else, or society, dictate your creation – just go do it. Apple and Twitter don’t rely on people telling them what they should make.

I don’t think that creativity is necessarily a taught behavior. Some people are more playful and spontaneous, and this shows in their work. The key to innovation is harnessing this and wedging it into those technical matters – creating something new is still mostly sweat, blood, and love. The BCC has a great series of viganettes with the foremost technical innovators in the U.K. The common themes: Creativity must be blended with hard work, and prepare to fail in a spectacular manner several times.

Why is technology the foremost arena for entrepreneurship? It isn’t a matter of technology and applications being the major player in innovation: It is just that the threshold is lower for entry. Any decent desktop (or even a powerful laptop) can access thousands of open-source development tools. For instance, I regularly use RapidMiner, Weka, Joomla!, Eclipse – the list goes on. For less than $1000, anyone with an iota of interest can set up a full production suite. This is the equivalent of a complete scientific laboratory – for free.

vote symbol: information

Image via Wikipedia

When you combine free tools with humanity’s unlimited creative potential, the growth curve is going to be exponential. One thing that is fantastic is the open-source movement, and how it is embracing meta-tools. Eclipse, Ubuntu (Linux for Dummies, OS style), mySQL, the Wikis (i.e. Wikipedia and its far-flung relatives) – all of these make technology a self-perpetual machine. Charles Babbage would be proud to know what new paradigms the Difference Engine has created.

Posted in Information Technology, Resource-a-rama | Tagged , , , , , , , | 1 Comment

Adobe Acrobat and Medical Record Archiving

As part of my daily work routine, I archive medical records. This process involves various bits and pieces – scanning the file, ensuring data integrity, loading to the medical record server, etc. The tool that I use is Adobe Acrobat. Actually, I recommend using an off-the shelf scanning application for any serious archiving project.

Whether it is the Neat Scanner (out of Philadelphia) or Acrobat, there is a level of sophistication that these tools have that borders on the uncanny. Acrobat will perform pretty high level optimization, and will render most text legible and searchable (which also means that it is indexable). While many medical record systems come with a module that will allow scanning into the record, I have never found a built-in scanning system that is worth using. For archiving or any type of record processing, it is truly best to use a tool built for the job, not something attached in order to meet a Federal requirement (medical record systems need to have scanning ability, but nothing says it is mandated to be great).

I think the key is that Acrobat, the Neat Co., and other data processing tools are industry standards. If you need a scanner that will automatically process your documents, go to the Neat Co. Acrobat will process, optimize, color scan, and properly handle documents. The medical record system we use, out of the box, will barely scan anything into JPG format – and forget about extracting those documents for later use without jumping through hoops.

If medical record systems want to actually build a useful component, they need to find an industry standard or partner with a document management corporation. This is actually an undervalued approach; I suspect that most companies hesitate due to cost. The payoff, on the other hand, could be well worth it. As medical records are barely regulated, this would be a method to establish market dominance and also put pressure on the competition (i.e. having a fully-functional document processing module is a great leverage point when negotiating with Federal and industry regulators).

Posted in Information Technology, Media Sharing | Tagged , , , , , | Leave a comment

Global Lifespan by Nation and Quality of Life

Life Expectancy and Health Care Costs

Life Expectancy and Health Care Costs

 There is an assumption that we in the U.S. will have longer life spans due to our economic strength. Tied to this is the opposing thought – that our health will suffer if we spend less or grow weaker in the global markets. I cannot say for certain this is false, but my research in global healthcare has pushed me towards thinking that our health has very little to do with economic strength, and much more towards quality of life.

Quality of life is something that researchers have just begun to explore, and it still falls into the category of ‘I know it when I see it‘ (thanks to the Supreme Court for that phrase!) On the other hand, GDP and lifespan are two factors that we have fairly good numbers on. Thus, I present the interactive map showing lifespan and amount in US dollars per capita, per year. My conclusion? That the amount spent has a tipping point (akin to a logarithmic curve) after which returns are minimal. Once a country spends more than ~$3000-3500 per person/year, there is little more return on the investment. After this, other factors must kick in.

Life Expectancy in the United States, 1900-200...

Image by Quiplash! via Flickr

Granted, the countries with the greatest longevity are mostly in the Western European/Northern American domains, but there are notable exceptions. For instance, China and Russia have roughly equal life spans, and the amount they spend per person is incredibly low as judged by US standards (the chart is available by clicking ‘see data’ at the bottom of the visualization). These countries are only a tad bit below us in lifespan, and successfully spend less than 1/10th of what we do. What is equally impressive is China’s lifespan compared to population. They are doing something right, and it may be that nebulous ‘quality of life’ that people in the US are just beginning to quantify.

Posted in Resource-a-rama, Wordplay and Commentary | Tagged , , , , , , , | Leave a comment

Health Information Technology and IT Security

Attached is a presentation I created for one of my graduate-level networking classes. The content of the Powerpoint is more the focus of this post than the commentary.

It should be obvious that Internet security is the hot topic of the 2010-2020 era, and probably will extend beyond that span. While the threat of a cyberwar is sometimes played out (or underplayed), the threat of a constant shifting array of forces that seek to gain advantage in a nebulous environment is real.

Medical record systems are overlooked by security agencies, and far too vulnerable to cybercrime. Any given medical database contains hundreds or thousands of vital patient records, which can fetch upwards of $50/patient. When combined with the fact that medical organizations pay an average of 3-4% of their budget towards IT, you have what economics terms an ‘undervalued’ resource. It won’t be a day too soon when a hospital is required to secure their systems and have at least one dual Health IT/Security person on staff.

Posted in Education, Information Technology, Resource-a-rama | Tagged , , , , , , , , , , | Leave a comment