Eddie Dunn's Blog

Musings and Insights from the Mind of a Human

Archive for March, 2013

Oz The Great and Powerful

without comments

OK I am not exactly a movie or TV buff. In fact when I woke up the morning I saw this film I would have called anyone a liar who told me I would see it by the end of the day. So the story goes with much in life every now and then you get a completely unexpected tender morsel for contemplation.

For those who may not know this story tells the tale of how Oz in the original film came to rule. In fact for IP related reasons it would be more correct to say that they are both based on the book.  I am not worried about spoiling the story because we all know how it ends. They did an excellent job of weaving recognizable imagery from the original film. James Franco seems naturally suited to the womanizing, half-shyster carnival magician. Rachel Weisz steals the show with her performance. The sheer radiance that is Michelle Williams also plays nicely as Glinda but this is not a movie review. I would hope that this would be read AFTER having seen the film.

Have you ever wondered if you were “good enough” or even “good”? Good enough to be this or do that, to take the leap of faith. Or just plain good in general? I know in my life too often when presented with real opportunity instead of accepting the possibility without hesitation I find this tendency to recoil into the same old pattern of thought and behavior. Why is this? The longer I draw breath I believe it is not a fear of not being good enough or not being good but a fear of being that which everyone sees in you and doing more harm than good. It is so terribly easy for those in positions of influence to allow their respective egos to run rampant.

This film really drove home several key ideas that I think anyone living a conscious life should take note:

  1. You will never be exactly the person you think you should be in all aspects of life.
  2. You will never be exactly the person whom others think you should be in all aspects of life.
  3. You will be put in situations where everyone is looking to you for guidance and you are just as unsure as anyone else.
  4. In situation 3, factors 1 and 2 must not stop you!

We are all shyster’s and con-men every time we blow up our ego to fend off anyone or anything that should try to knock us off our precious pedestal of illusion. Furthermore life is not a pretty package with a bow on it. It is a wild beast that at times provides smooth riding but then as the inevitable storm comes it can turn into a bucking bronco that is all we can do to just hold on.

It seems that the older I become the more I find myself in the position of feeling like everyone is looking at me to “make something happen”. I would say in nearly all situations this dynamic arises you will never be the wizard everyone was expecting… But you just might be the one they need!

“You just have to make them believe” – While individuals will be cease to be, ideals and values live on in the hearts and minds of everyone. Sometimes it takes a “wizard” to wake us up to this possibility. In reality the wizard only shines a light back illuminating what was already there lying dormant in each of us. The possibility of creating something better not only for ourselves but for everyone we come in contact.

Written by tmwsiy

March 21st, 2013 at 9:37 am

Posted in thought

Tagged with

Drobo Linux Survival Guide

without comments

First off when it comes to my own data I am a bit of a “rebel” in that I have adopted the practice of storing not easily replaceable personal “bulk” data (mainly live music recordings) solely on spindle drives instead of burning to optical media. Five years into the process after many copies and “failures” of all sorts, my luck has held. So know that as you read the following.

I have a love/hate relationship with the devices that are made by Drobo. On one hand, especially for the time at which they were released and the Apple-like marketing and packaging, these little devices were all the rage and I bought in. I own two of the 4 bay models personally and have purchased 2 more 8 bay models (one Elite and one Pro) at work and all still work. In fact with the exception of a hosed partition table due to a dirty shutdown (no power button, really?) that I was able to recover I have never been without my data and considering the abuse I unleash on these things IMHO that is an accomplishment. Having been through several cycles with these things I do feel confident in recommending them for archive storage. While slow and quirky these devices do a great job of sucking and spitting data.

Drobo does not “officially” support linux. You can read a whole bunch of articles but this is what you need to know in a nutshell:

  1. DO NOT EVER FILL UP THE DEVICE! If you do you must copy off the data and start over if you want any confidence of redundancy.
  2. Only use ext3. While I have had no real “problems” using these devices with LVM and ext4 after talking to some of the engineers and after a couple of near misses I decided to leave the configuration as vanilla as possible to avoid any potential issues. 
  3. Ignore the lights on the device! Especially if you frequently add and delete data (such as in normal backup operations. Because the device is “data aware” ie it keeps up with things on the block level and has no way to mark a previously used block as now free, in the backup use case things get wonky with respect to the lights very quickly. You must know how much data you have available (through the drobolator) and then use df to determine how close you are. Luckily once you figure this out you are usually good for awhile. :)
  4. Use drobo-utils to manage and check your device. There is a command line tool drobom that has many useful features including checking status, reformatting, updatefirmware, etc.

 

Hopefully this will help someone! If you have any questions I would be happy to try and answer them.

Written by tmwsiy

March 20th, 2013 at 4:26 pm

Posted in technology

Tagged with , ,

Big Everything – The new frontier of computing

without comments

Eddie Dunn – University of North Carolina Wilmington

I. Introduction

The term “Big Data” is on track to eclipse the venerable “Cloud Computing” buzzword from recent years. In fact the term big data is not new and is traditionally and technically defined as data too large to be processed by anything short of a “super computer”. In today’s terms that means exabytes but it is a constantly moving target and we will soon be using zettabytes (1). I would suggest that this phenomenon is not limited to the space that we have defined. In fact a better term might be “Big Everything”. We have arrived at a crossroads in our fast changing discipline that requires us to rethink very basic aspects of the way our craft has been performed since any recent recollection. Hardware manufacturers have done a stellar job at keeping up with Moore’s law even when the speed of light and power requirements forced them into the parallel and data-parallel computational models and the benefits of that work are coming to fruition in a big way. What is still evolving is the software tools and training needed for the rest of us to effectively utilize those resources not only to tackle “big” problems but also in our day to day jobs and lives. With techniques like MapReduce (2) and its famous Apache Hadoop implementation and related ecosystem, and the proliferation of GPU and CPU cores a computation that would once take days can now be theoretically performed in sheer minutes (or soon seconds, ad infinitum) with roughly the same cost hardware.

II. Architecture

Micheal Flynn in 1966 came up with a taxonomy of computing that lends an excellent way to discuss the different modes of computation coming online from the hardware vendors. He elicited four major types of computer architecture (3).

1. Single Instruction, Single Data stream (SISD) – This is the Von Nuemann model that is taught in basic computer architecture courses. One instruction is executed on one piece of data at any given time. The tradition x86 (Pentium class) chip is an example of this.

2. Single Instruction, Multiple Data streams (SIMD) – Also known as data parallel. A single instruction is applied to multiple pieces of data at once. The GPU is an example of this architecture.

3. Multiple Instruction, Single Data stream (MISD) – In this model multiple instructions are executed on the same piece of data. This is the most uncommon architecture type. The example of control systems in the space shuttle is an example of this type of computing model.

4. Multiple Instruction, Multiple Data stream (MIMD) – In this architecture multiple separate instructions are executed on multiple separate pieces of data at the same time. Modern multi-core processors from AMD and Intel represent this model. This model is also commonly divided into sub-models: 1) Single Program, Multiple Data (SPMD) is cited as the most common form of parallel computing. MapReduce in its most basic form follows this model. 2) Multiple Program, Multiple Data (MPMD) This can be either a shared memory model such as a modern multi-core processor or a distributed memory model such as the MPI architecture.

In fact many of the “rules of thumb” that we as a discipline have come to rely as it turns out is only one use case of Flynn’s taxonomy (SISD) and this oversight is rapidly overtaking us. SISD has reached the end of its ability to bring novelty to our discipline. The power of SIMD and MIMD is here now. We need to gather the communities, tools, and paradigms to confront these challenges in a proactive manner. It is with the lens and better model resolution obtained with the massively parallel that we can, as a discipline, keep up our end of pragmatic and life-saving power that is modern technology.

III. Languages

On the coding side of this picture it was once though that in object oriented languages we could solve the problems of concurrency and the non-determinism it introduces through bolt-on language features. This has taken us a large step towards this end. Anyone who has done any high concurrency programming with these features can attest to the very subtle and hard to debug issues these bolt-on additions and constructs tend to create. We have heard for years now the buzz created by multi-paradigm languages such as scala, f# and the like and the mantra of why we all need closures and functional language features. There has been recent work concluding that contrary to popular belief functional languages do not incur the performance penalty once thought. (4) There is no argument that the design patterns and anti-patterns that will best adapt to the new hardware architectures we are seeing will need to utilize the paradigms  afforded by functional languages added to imperative (and even declarative)  toolsets but the problem is much more complex! The benefit of functional constructs  is no new idea however the importance of tailoring a program’s composition, structure, and execution to best utilize the hardware topology is a very hard problem! The complexity of the hardware possibilities and the correspondingly complex set of performance characteristics introduced as result is mind boggling. The most commonly stated problem with bolt-on concurrency is mainly the problem that come with un-necessarily scoped variables and the race conditions they create when lots of execution units attempt access.  We need the tools and design patterns (or anti-patterns) to productively produce code that we can have some confidence that will run as we expect in any of the multitude of execution environments that pervade the digital landscape.

What new skills and tools should we be imparting to future generations of computer scientists? The most common languages that are mentioned in this new context seem to be Erlang, Haskell, Clojure, Scala, Go, etc.  These languages were designed with concurrency in mind.  New languages and paradigms for concurrency are only a part of the puzzle. We also need fundamental new ways of conceptualizing data processing on a massive scale.

IV. Storage and Processing

Now what about how to store, organize, and process all this data?  While not without its dissenters (5), the MapReduce data processing paradigm and its corresponding success wake is just one example of many use cases that are being explored. The data stores that are used in these systems are typically of a key/value nature. Google’s big table. Amazon S3, and Hadoop HDFS are all examples of this type of model. This model allows data to be loaded very quickly and in an ad-hoc manner that is crucial to the success of these systems. In fact the real business problem that these new analytics players are leveraging is that traditional business intelligence models typically have a process called Extract Transform, Load (ETL) that transform’s their transaction-based, mutable “live” database (along with other data of various types into the OLAP cube the  analysts use to provide actionable information to the leadership of the organization. Many organizations using traditional data warehousing products are finding it difficult for the ETL phase to stay current and decisions have to be made based on progressively less and less current (ie accurate) data. In fact the real power in these new MapReduce based storage, processing, and retrieval techniques is its ability to bring to process quantities of data previously unimaginable with clusters of commodity hardware that can be rented by the hour. In fact the database naysayers’ only argument is that you can pay (royally) to have RDMS systems scale to whatever level of performance is desired. Unfortunately,  the vast majority of organizations who have data that can be utilized do not have the budget to get the big RDMS systems to do the things that they are able to accomplish that which theoretically can be done with a fraction of the budget in these new models. What’s missing currently are people that have skills in the implementation and have practical knowledge about the limitations of these new techniques. One of the most common iterated criticisms of the big data trend is that it creates a elite group of folks that can leverage these tools and those that cannot. (6) (7)Some are envisioning a unified system that will employ concepts and processing models from both the RDMS and MapReduce space to provide a best of both worlds system that can provide the desired performance characteristics for the desired workload. (8)

V. Analysts

With the realization of the analysts position in the Big Data puzzle some have suggested packages such as R, Matlab, Octave, SPSS, and SAS as platforms very well suited to provide the familiar interface from the world of the analyst to the world of designing software and systems to bring us closer to realizing in a much more full manner the power that is latent in an organization better understanding itself through its data. These packages can and should have the ability to abstract away the details of where data is and where it needs to be or how code is executed. This is a natural marriage with big data, scientific computing, and massively parallel. These packages are in a unique position to immediately effect change on a large segment of the world’s problem solvers who have the skills in mathematics and statistics are already familiar with these tools. In a recent article Efficient Statistical Computing on Multicore and MultiGPU Systems the authors describe just a scenario where they rewrite common statistical algorithms chi squared distributions, Pearson correlation coefficient, and unary linear regression in MPI and CUDA and provided interfaces from the R language to utilize multiple CPU and GPU clusters to speedup calculations (9). They achieved a 20X speedup with a four node MPI implementation and 15x speed up with three GPU’s with the CUDA implementation. The company Revolution Analytics has an open core version of R that is sells support for that is attempting to close this gap from the industry side (10) (11) (12).

VI. Deep Learning Networks

Areas such as machine learning and artificial intelligence as well as more traditional statistical techniques are big players in this game. In fact it seems that these concepts and techniques will in all likelihood shed the most future light on ways that we as designers of software systems can best leverage the resources available to do our part to bring about a more full realization of Moore’s Law based on capability of our computer systems and associated physical devices with respect to doing human tasks. This all begs the question as to the real business we are accomplishing with respect to our field and not solely how to help organizations make better decisions. Richard Bellman and his 50 year old dynamic programming theory correctly pointed out what today’s AI and big data researchers are constantly reminded: As the number of dimensions increases linearly in a pattern classification application the computational complexity increases exponentially.  He coined the term “the curse of dimensionality” to describe this observation (13). The traditional way to battle this curse is to pre-process the data into fewer dimensions (a process called feature extraction) in an effort to reduce the overall computational complexity. Recent discoveries in neuroscience suggest that the neocortex in our brain does not do feature extraction as we have come to know it and actually seems to propagate unaltered signals through a hierarchy that over time learns to robustly represent observations in a spatio-temporal manner. A new area of artificial intelligence research has cropped up around using this insight called Deep Machine Learning that makes its goal to more accurately model the way our brain functions (14).

VII. Conclusion

We are experiencing exponential growth not only with respect to number of transistors on a silicon chip but also in our ability to bring technology to bear on automating many more human tasks (15). We have cars that drive themselves and the largest thing holding back an automated transportation system is people’s fears. We are on the brink of having wearable “heads up display” computers that can help us in our most basic daily interactions.  Thanks goes to leading artificial intelligence researchers, Google (for publishing it’s secret sauce), and the multi-billion game industry for pushing the limits in 3-D , virtual worlds we now have at our fingertips the power to literally transform our reality. We just have to figure out how to write the software to bring it to bear!

Works Cited

1. Big Data: Issues and Challenges Moving   Forward. Kaisler, Stephen, et al., et al. Manoa, Hawaii :   IEEE, 2013. 46th Hawaii International Conference on System Sciences. pp.   995-1004.

2. Dean, Jeffrey   and Ghemawat, Sanjay. MapReduce: simplified data processing on large   clusters. Communications of the ACM. January 2008, pp. 107-113.

3. Some Computer   Organizations and Their Effectiveness. Flynn, Michael J. 1972,   IEEE Transactions on Computers, pp. 948-960.

4. Combining   Functional and Imperative Programming for Multicore Software: An Empirical   Study Evaluating Scala and Java. Pankratius, Victor, Schmidt, Felix   and Garreton, Gilda. Zurich, Switzerland : IEEE, 2012. International   Conference on Software Engineering. pp. 123-133.

5. Kraska, Tim.   Finding the Needle in the Big Data Haystack. IEEE Internet Computing. Jan-Feb   2013, pp. 84-86.

6. Leavitt, Neal.   Bringing Big Analytics to the Masses. Computer. January 2013, pp.   20-23.

7. Six   Provocations for Big Data. Boyd, Daniel and Crawford, Kate.   Oxford, UK : Oxford Internet Institute, 2011.

8. Beyond Simple   Integration of RDBMS and MapReduce – Paving the Way toward a Unified System   for Big Data Analytics: Vision and Progress. Qin, Xiongpai, et al., et   al. Xiangtan, China : IEEE, 2012. Second International Conference on   Cloud and Green Computing. pp. 716-725.

9. Efficient   Statistical Computing on Multicore and MultiGPU Systems. Ou, Yulong,   et al., et al. Melbourne, Australia : IEE, 2012. 15th International   Conference on Network-Based Information Systems. pp. 709-714.

10. Smith, David.   R Competition Brings Out the Best in Data Analytics. s.l. :   Revolution Analytics, 2011. White Paper.

11. Revolution   Analytics. Advanced “Big Data” Analytics with R and Hadoop. s.l. :   Revolution Analytics, 2011. White Paper.

12. A Platform for   Parallel R-based Analytics on Cloud Infrastructure. Patel, Ishan,   Rau-Chaplin, Andrew and Varghese, Blesson. Pittsburgh, PA : IEEE,   2012. 41st International Conference on Parallel Processing Workshops. pp.   188-193.

13. Bellman,   Richard. Dynamic Programming. Princeton, NJ : Princeton   University Press, 1957.

14. Arel, Itamar,   Rose, Derek C and Karnowski, Thomas P. Deep Machine Learning – A New   Frontier in Artificial Intelligence Research. IEEE Computational   Intelligence Magazine. November 2010, pp. 13-18.

15. Kurzweil, Ray.   How To Create a Mind. New York, NY : Viking Published by the   Penguin Group, 2012.

Written by tmwsiy

March 4th, 2013 at 10:44 am