think stats - allen b. downey

What I like about Think Stats is that it is direct and to the point. It includes a case study that runs through the book and works on data available online. It provides a great starting point for exploring once you see how the given examples work. Each chapter has a handful of exercises that can get you started if you aren't sure what to do next. Downey has an easy style of writing and finds the fine line between enough information and too many details. That said, this book might be a bit thin if you don't have any experience with statistics or have access to a mentor.

Keeping in mind the that the book is a focused overview, it certainly supports the programmer who is looking for hands-on examples but I believe it also is useful for the non-programmer that needs a quick understanding of the core concepts. They may not be able to do the calculations but they will be able to participate in a conversation.

As it's concise and has active examples, the book would be a great supporting text for a course that requires assumes some statistics experience but doesn't need the overhead of a full-blown stats book. As I have mentioned in other reviews, this book is a good addition to the O'Reilly collection of books on data mining - Segaran's Programming Collective Intelligence, Russell's Mining the Social Web, and Janert's Data Analysis with Open Source Tools.

Full information: Think Stats by Alln B. Downey, O'Reilly Media, Inc.

the ipad is almost perfect

I recently bought an iPad. It's the 64GB, wi-fi only model - in black if that matters. It took me a long time to get to the point where I thought it would be useful. I spent a good bit of time reading about them online and talking with people who had one. Ultimately, for me, the decision represented a compromise: get an iPad instead of a Macbook Air.

Portability and more efficient processing were the main reasons I was considering getting a machine. Work provides a notebook but it's a beefy thing that has the horsepower and storage space I need for a good bit of what I do. The 5+ pound brick is a bit much to lug around, though, as often I don't need the full capacity, e.g., going to visit relatives or short vacations.*

More recently the machine is not able to meet my computing needs. One of the main issues is that I'm pushing the limits of it's processing capacity. While many applications run reasonably well, anything with a video component causes the processor utilization to spike and, in turn, the fan to run at full speed. This is really problematic with using a webcam. Many of my classes are distributed between multiple campuses. In an attempt to be equitable, since resident and remote students are graded equivalently, I feel compelled to be available for video conferencing. Since my current machine doesn't have a built in webcam I have to add an external one. This is also a an issue while watching videos of any duration. It's not uncommon that computation-heavy programs I write/run also cause a near meltdown but I think that's expected.

An upgrade is not available to me until next August so I need a solution in the meantime.

After researching notebooks and tablets, I eventually decided that a Macbook Air was the best solution for me. Since I was arguing portability as a driver for the need, I was looking at the 11" version but wanted as much storage and horsepower I could get. This meant selecting the 128GB model and upgrading the RAM from 2GB to 4GB, perhaps, too, opting for the processor upgrade. So, those plus a couple of accessories later the price tag was about $1500. It was high enough that more thought had to be put into the decision. I looked at competing Windows machines but I wasn't able to get the same performance or a solid state drive - neither for the price of the Air nor at all.

The next option I considered was the iPad. With all of the positive reviews and feedback I received from others it became a very viable option. It of course isn't the same as a notebook but the cost was half that of the air and the tablet functionality was something that needed to be taken into consideration. In the end, the iPad was the direction I decided to go.

As you might expect, I am happy with the purchase. So far the only problem I have encountered is that I haven't found a way to share my iPad screen to others during web conferencing. Where I need it most is when using Adobe Connect as that is the platform used to distribute my classes. It's also a convenient tool during office hours if I need to demonstrate/draw.

A feature I use extensively in class is PowerPoint's pen mode during a presentation. We have interactive tablets in the classrooms where the majority of my classes are held. What I was hoping for was the ability to do that via the iPad when some other location. There might be an app that supports this but I haven't found it. I have the Keynote app which I am sure will be generally useful but wish the feature existed.

I feel the compromise was a good decision. If nothing else I can while away my free time with Garage Band.


* Technically university owned computing equipment should only be used for work-related activities so I should by my own for personal computing. However, there is an expectation that I am accessible when not on campus. So, in exchange for the mix of work and life, I'm completely comfortable for the general use of my notebook for personal reasons.

data analysis with open source tools - philipp janert

Data Analysis with Open Source Tools does a great job covering a lot of topics in way that balances theoretical explanations and practical demonstration. In keeping true to its title, a wealth of tools (and data sources) are identified and explored.

Because the book offers a balance between explanation and demonstration it can be read in two different ways. First, you can read the chapters without getting involved with the code to get a better understanding of the whys and hows of the different analysis techniques. On the other hand, if you are more of a brass tacks person, you can focus on the code, run the examples, and just skim the explanations.

For those that are exploring the world of data analysis, this book is a great compliment to Segaran's Programming Collective Intelligence and Russell's Mining the Social Web. Where the books overlap the explanations and examples differ which helps enormously when trying to master the concepts and techniques. However, each book contains topics not in the others. Collectively they offer a rather powerful set of tools.

Having read the other books prior to this one, I really appreciated the time spent on the mathematics behind each technique. The others get your hands dirty very quickly - and I appreciated that greatly when first exploring data mining - but I found myself wanting to have a deeper understanding which this book so nicely provides. As Janert mentions in the first chapter, the succinct notation of mathematics is much clearer than having to try to extract the essence of twenty lines of source code. Without a doubt, though, Data Analysis is dense which and that might turn a few people off.

All said and done, I'm glad I took the time to read the book and will definitely keep it nearby.

Full information: Data Analysis with Open Source Tools by Philipp Janert, O'Reilly Media, Inc.

it's really about texture

I happened upon a short article, "A Few Pedagogical Insights about C++ Teaching: Public Data Members", by Danny Kalev. The thrust of the article is that exposing data members in a class mostly, if not completely, be avoided.

Encapsulation is at the heart of object-oriented design and programming. The point is to avoid dependencies on implementations and stick to interfaces. As an implementation of a class needs to evolve, it's likely an interface can be preserved. As the article points out, building the get/set interfaces for attributes seems a bit heavy in cases where changing a value doesn't require changes in other attributes to keep the class coherent.

I sit on the same side of the spectrum as Kalev. The only argument I can see in favor of enabling direct access to data members is to meet performance requirements. There might be instances where the overhead of an operation call can't be tolerated necessitating public visibility. However, these cases should be treated as a rare exception. I would also argue that such a decision needs to be made quite late in the design process and only after it has been demonstrated that the overhead is too great.

Regardless of what side of the fence you sit, the discussion involves the notion of texture. Texture was initially defined by Jazayeri, Ran, Van Der Linden, in Software Architecture for Product Families: Principles and Practice. It seems to be little used concept but it resonates with those I know that have read the book.

Conceptually, texture is the look and feel of an architecture. Elements in an architecture - and as a result the design and implementation - should be consistent and predictable. Such an architecture is easier to manage and maintain, enables people unfamiliar with a system to grasp it quickly, and helps gauge the difficulty of reusing architecture, design, and implementation elements. (In fact, colleagues and I wrote a paper asserting that texture is the key to measuring of reusability.) More formally, elements of texture fall into one of three categories: policies, patterns, and aspects.

With respect to the problem at hand, in a sense Kelav's argument, the need for an "objective criterion" is a texture argument: a policy needs to be established that dictates when attributes should be publicly available. Having such a policy (and assuming it's followed/enforced) ensures the decision is made consistently. Defining such a criterion is, of course, quite difficult.

I fully recommend the Jazayeri/Ran/van der Linden book, though it seems to no longer be in print. Though I can't quantify it, I think it should have received more attention than it did. It's definitely worth a read if you can find a copy.

what (not?) to wear

In fall 2009 I opted for a wardrobe change and moved from the tshirt/jeans approach to a more formal coat/tie approach. I've had some difficulties with students taking me seriously at times and I attributed some of that to the fact that I look like one of them*. The nature of a small campus also means there are only a handful of faculty members so I see the same students for multiple semesters in a row. Switching to a more formal attire was an effort on my part to assert a visual reminder that I, in fact, am not one of their peers.

Recently I surveyed my students on professor attire. I provided them the following choices:
  • Faculty should wear a suit and tie.
  • Faculty should at least wear a suit and no tie.
  • Faculty should at least wear a coat, khakis, dress shirt, and tie.
  • Faculty should at least wear a coat, dress shirt, khakis.
  • Faculty should wear at least a coat, casual button down shirt, and casual pants (non-jeans).
  • Faculty should wear at least casual button down shirt (jeans as pants are okay).
  • Wear whatever. It doesn't matter to me.

86% of my students responded to the survey with the following distribution:


So, going by the response, if I were to dress with a sport coat, casual shirt, and casual pants, but no jeans, I would satisfy 81% of the students. I also asked for comments explaining their choice. Many of them indicated course content is more important than the professor's attire but a few commented on how professors should dress to reflect their professional position.

The rationale for the shift to more formal attire was an experiment to see if, in my case, it made a difference. Between my young appearance and successive semesters with students, it's hard to not develop an informal rapport with the students. In many ways, this level of informality is a good thing. Students often feel more comfortable in class, talk more, send more emails and tend to utilize office hours more frequently than others (that's still not all that much but there's a difference). Much like a parent who tries to skirt the line between friend and parent to their child, this leads to problems when it comes time to enforce boundaries such as due dates and bad grades.

The question, of course, is, "Did it work?" I don't know that I have any way to really assess that. The only quantifiable indicator I have comes from student evaluations and over the two years my overall scores did improve a bit. There are a lot of factors that could account for that change but I don't think I changed the approach to my courses that much over the semesters. For one group, they only knew me tie-clad and their evaluations were the same as the classes that saw the switch.

I was discussing this with a colleague and he remarked, "Why would you let the students decide how you dress?" This made me reflect more on what I was doing. In some ways, yes, I suppose I am letting the students influence my appearance. Still, dress is part of establishing a proper environment. Personally, though, I found it a bit more than that. For the two years of the experiment, there was something about the ritual of getting dressed. I always had to shave, I had to pay attention to what shirts and ties went with what pants and coat. And, no matter how I felt, the ritual was the same. It's akin to an athlete putting on a uniform.

Going forward, I'll probably reserve the khakis, ties, and dress shirts for the more formal things I need to attend. I will go back to casual button-downs and pants. It's more my style anyway. I feel more comfortable.

* When I first arrived on campus in my early 30s, I looked more mid-20s. A decade, two kids, and a tenure process later, I am probably starting to look more my age, for whatever that's worth.

review: the bed of procrustes by nassim nicholas taleb

I jumped in both feet in buying this book. I read Taleb's The Black Swanand (as of this writing, part of) Fooled by Randomness. I found both very thought provoking and, in the case of the former, my interest fueled enough to read a couple references. Without looking into it further, I pulled to book into my Kindle and started reading and where I was expecting discussions about aphorisms found essentially nothing but aphorisms. The prose I was accustomed to was reduced to independent (though related if digested to a reasonable degree) statements. It is another way of delivering the ideas. At first I felt a bit "seriously?" but once I was passed the initial surprise it was refreshing.

As always, Taleb is not lacking for unforgiving conviction (colored by his confident sense of humility) but that makes his writing quite focused. In the prose-based books he tries to cover the space of his ideas, arguing from multiple points of view, and the confidence helps cut to the chase more efficiently. The aphorisms take that approach even further. I don't necessarily agree with the assertions of all of the aphorisms but that's entirely the point - the reader should arrive at his own conclusions. Having come to respect his ability to be insightful, the aphorisms were worth considering.

In gathering my thoughts before writing this I considered how I might recommend reading the book: by chapter or by individual aphorism. There's merit to both but I think I lean to reading by chapter as there are common threads. I understand the point of an aphorism is to be standalone but there is also something to be said about the larger picture.

An interesting impact this book has had has been at the breakfast table. I am fortunate enough to have the flexibility in my schedule to be one who ensures my seven-year-old daughter is fed, dressed, and at the bus stop on time. We eat a quiet breakfast every morning together and it affords an opportunity to engage in interesting conversations. Sometimes we discuss things such as mathematical concepts, those way beyond the second grade expectations, and sometimes its just as simple as her telling me about her day (I work nights twice a week and arrive home after she's in bed). The aphorisms have given us something else to talk about.

I think the book is worthwhile if aphorisms are of interest. It really provides an opportunity for reflection if you take the time to do so.

[Of course, I realize, taking at face value some aphorisms in the chapter, "The Republic of Letters", the fact that I have written this review invalidates it. So take for what it is (or isn't) worth.]

model(ing) employees

An article in the NY Times on Sunday discussed Google's effort to understand what makes their best managers their best managers. As the article reports, the top ideas were not all that surprising:
  • Be a good coach
  • Empower your team and don't micromanage
  • Express interest in team members' success and personal well-being
  • Be productive and results-oriented
  • Be a good communicator and listen to your team
  • Help your employees with career development
  • Have a clear vision and strategy for the team
  • Have key technical skills so you can help advise the team
    see complete details
The above list is ordered most important to least. Managers that excelled in these areas were typically leading in productivity as well as employee retention and happiness.

Similar results were obtained by a study done around 2005* that attempted to determine characteristics of the best project managers. The report was summarized and discussed in the book, Alpha Project Managers, by Alex Crowe. The book didn't order the characteristics in terms of importance but the overlap is there. Similarly, a quick search yields similar observations. (Some might be grounded in actual studies but many seem to be more anecdotal.)

Of interest might be the relationship between the Alpha Project Manager study and Google's. The Times article quotes Google’s Prasad Setty,
“We want to understand what works at Google rather than what worked in any other organization,” says Prasad Setty, Google’s vice president for people analytics and compensation.
So, whereas Google's study is specific to Google (though global in scope), Crowe's crosscut many organizations over many countries (in North/South America, Europe, Asia and Africa). The point is the characteristics are generally the same.

In terms of modeling itself, this isn't so strange. IBM engaged efforts to model it's employees - both workers (see Steven Baker's The Numerati) and management. As with Google, part of the effort is to understand their best people and how to help the other employees reach their potential. However, on a more practical side, modeling their workforce will help managers find the right person to fill a specific need as well as, particularly with consulting efforts, to optimize teams to be deployed to clients.

What emerges from these studies is advice for those entering, or thinking about entering, the job market. If companies can determine characteristics of their best employees, it would be natural to screen applicants using those qualities. For certain, organizations have always reviewed resumes and interviewed potential hires looking for certain qualities they deemed important. What's different now is that organizations may be approaching the process with hard data.

* I wasn't able to readily find the year the study took place. The book, however, has a copyright of 2006 so I'm estimating the study ended in 2005.