Adventures in Swift

Now that I’ve scrapped the idea for my current project of a tabletop interface interface that can be approached by multiple users from either side, I haven’t yet found another compelling reason to use the Perceptive Pixel in particular. As I’ve started to think through the implementation itself, I’ve also realized that
my project, as compared to others, is going to require far more work in low-level implementation. For instance, there are only a couple of projects (including mine) where we’re going to have to dig into the nitty-gritty of gesture recognition, as opposed to having the convenience of having this work already done for us done in a library. I wouldn’t have thought to scrap the Perceptive Pixel idea unless I knew of a better alternative. Fortunately, that better alternative was present for me in a touch platform that I already know. I’ve been developing on iOS sporadically since 2009, using it for this project would at least allow me the advantage of not having to learn an entirely new platform/framework/etc. in order to develop for the Perceptive Pixel. For these reasons, I’m now developing my project for the iPad.

I’d heard from several people that Swift–Apple’s new language for iOS and OS X development–was enjoyable, at least, to use. I jumped right in, read up a bit, and decided to use Swift for this project, as well. This post, then, is a write-up of a couple of the more interesting and/or frustrating things I’ve come across in developing my first NUI using Swift.

First, there are plenty of resources out there touting Swift’s interoperability with Objective-C, C++, and C. As all other tools (Apple and third-party frameworks and libraries) are written in one of these other languages, this is a must. However, when it comes to interoperability, I’ve hit one major roadblock: class methods on ‘bridged’ Objective-C classes. See, if you want to use Objective-C classes in your Swift project, you bring them in with the use of a bridging header. When a given Objective-C class/framework/library is imported into Swift, it’s public API is remapped to something usable in Swift. This largely affects the arguments and return types those remapped functions and methods are expect to get from and to give you. In general:

  • Certain Objective-C types are to their equivalents in Swift, like id to AnyObject. id is Objective-C’s pointer-to-anything, and AnyObject is an optional type in Swift that can wrap an object of any type, much like a Maybe in Haskell.
  • Certain Objective-C core types are remapped to their alternatives in Swift, like NSString to String
  • Certain Objective-C concepts to matching concepts in Swift, like pointers to optionals

Interacting with the API that’s exposed to you using Swift is fairly straightforward. For more information, Apple has made a start at documenting this process. I say ‘made a start’, because there are some places where this documentation still sucks. One of these places is in describing how Objective-C class methods are mapped to Swift. Not understanding this is something that has given me quite a few problems, especially when it comes to interacting with third-party (non-Apple) code that is only available in Objective-C. Here’s the documentation that Apple does give for this:

For consistency and simplicity, Objective-C factory methods get mapped as convenience initializers in Swift. This mapping allows them to be used with the same concise, clear syntax as initializers. For example, whereas in Objective-C you would call this factory method like this:

In Swift, you call it like this:

In other words, all class methods (what Apple is calling ‘factory’ methods) and all initializers on your Objective-C classes get mapped to a convenience initializer in Swift. This essentially means that all Objective-C class methods and initializers on some class MyClass get mapped to some different signature of MyClass() call. My first problem here is that not all class methods are necessarily factories. Beyond that, though, Apple really doesn’t give us any explicit information as to how this mapping takes place.

We do know this much: a call to the sole initializer of an Objective-C class that takes no arguments is mapped straight across:

That’s easy enough. For initializers that do take arguments, I’ve come to resort to relying on code completion in order to figure out how different those initializer signatures are mapped. If I have this Objective-C class, for instance:

The bare initializer is straightforward. There’s some changing of parameter names that goes on under the hood for those initializers that do take parameters, however. These are the respective calls in Swift:

First, it looks like changes to parameter names only occur for the first parameter. Got it. Next, for those first parameters, ‘init’ is always stripped off of the parameter name, so -[MyClass initWithFoo:andBar:] becomes MyClass(foo:, andBar:) in Swift. Second, ‘with’, if used, is also stripped from the first parameter name. Now, having two intializers with signatures -[MyClass initWithFoo:andBar:] and -[MyClass initFoo:andBar:] is certainly poor style, but these two initializers will map to the same method on MyClass, namely -[MyClass]. If you have defined similar methods to these, and each has differing behavior, you’re not going to see the results you expect to see. I’m not entirely certain that I understand Apple’s motivation in stripping words from parameter names; I’ll assume it’s an effort at maintaining succinctness. But, Apple’s stump speech with Swift has been all about safety, and I’m not sure this behavior is in line with this goal.

Similar things happen with class methods (or according to Apple, ‘factory’ methods). They’ve given us one hint in the previous example: +[UIColor colorWithRed:green:blue:alpha:] maps to UIColor(red:, green:, blue:, alpha:) in Swift. So, it’s clear that something is going on here with the first parameter name. And, if you guessed that it has to do with the fact that a suffix of the class name is also a prefix of the first parameter name, you’d be right. It seems that any portion of the first parameter to a class method that is a suffix of the class name itself, is removed from the parameter name. This causes two problems. First, as documented here, all of these three class methods map to the same initializer:

Wow–that’s annoying. Do you see the second issue, though? What if I have a class method with the signature +[FancyDoodad doodad] that does a whole bunch of grunt work in setting up my FancyDoodad? To what initializer does this class method map in Swift? That’s right–the bare initializer. ‘doodad’ is stripped out and it’s left with nothing to which to map other than the bare initializer, FancyDoodad(). Can I work around this in my own Objective-C classes that I write and intend to use with Swift? Of course. But, I’ve yet to come across a case where I’ll need to write some helper class in Objective-C. Most of the Objective-C I use now are old things that I’ve written, or more often, third-party libraries. What to do when one of these libraries gives me a class method that makes their class so much more convenient to use, but follows this (very common) naming pattern? Well, I’m left with two obvious options: either go and change the class method signatures in those third-party classes (which would be an enormous mistake), or write a category on the third-party class that monkey patches in a new method that can get at the original Objective-C method. Will that work? Yes. Is it a pain in the ass? Also, yes.

So, that’s one of the pitfalls I’ve come while working with Swift and Objective-C in developing this NUI. And, considering that this last issue tramples code that follows a very common pattern in the Objective-C community, I’d say that Apple has screwed the pooch on this one. It’s certainly fixable, and I think it behooves Apple to do so. I’ve come across a number of other issues (getting used to juggling optional types again, for one), but those will have to wait for another post.

Other Similar Database-Related Interfaces

Here are a few thoughts on a couple of database-related interfaces (two of them also touch-based) that are similar to the query building NUI that I’m currently building. I’ll look specifically at areas where my tool can improve, and problems with which these other projects have dealt of which I should be aware as I continue to work on this project.

First, Stefan Flöring and Tobias Hesselmann created TaP,

a visual analytics system for visualization and gesture based exploration of multi-dimensional data on an interactive tabletop…[U]sers are able to control the entire analysis process by means of hand gestures on the tabletop’s surface.

This doesn’t seem to be entirely true, as many tasks are accomplished through half-circle radial menus called stacked half-pie menus. I’ll also note here that while the authors claim this to be a collaborative interface, all collaborators must be grouped around a single edge of the tabletop. It seems that neither were Flöring and Hesselmann able to address the problem of coherent representation of a orientation sensitive entities to multiple users at different positions. They do acknowledge this, but give no advice on how to address this problem.

While TaP isn’t a query construction tool, there are several issues that Flöring and Hesselmann have addressed from which it may be useful for me to learn. While their half-pie menus may not make sense as direct replacements for the menus I have currently designed, it is possible that their layered approach to the radial menu may be useful. I also like the ability to call the menu forth from any location on the screen with the heel of the palm. TaP’s dropzones are also in line with my thinking, and seem to be intuitive from watching the video.

The only real gestures beyond the obvious ones for moving and scaling objects are the tracing of rectangles to create new charts, and the tracing of circles to open the help menu. These seem contrived to me; they seem like gestures created just for the hell of it. For better or worse, this reinforces my aversion for designing gestural interactions in my tool unless they seem specifically useful or called for.

GestureDB is a tool very similar to the tool I am currently designing. The designers of GestureDB describe it as

a novel gesture recognition system that uses both the interaction and the state of the database to classify gestural input into relational database queries.

The primary difference between GestureDB and the tool I am developing is that GestureDB addresses problems in designing queries against relational databases. My tool, on the other hand, targets NoSQL (non-relational) databases. While there are similar problems in designing queries for both relational and NoSQL databases, building queries for NoSQL databases does present its own unique set of challenges. Nevertheless, there are a number of things to learn from the experiences of the designers of GestureDB.

First, simple gesture recognition may not satisfactorily describe the range of intent of a user when designing a query. To address this issue, the designers of GestureDB use an entropy-based classifier that draws on two sources of features. As usual, it narrows the set of potential gestures based on spatial information contained in the gesture. Second, it prunes the space of possible user intent by examining which actions represented by gestures are more likely than others given the constraints imposed by the underlying database structure. Based on this, the classifier automatically detects the most likely intent based on all possible intents. It may not be the case that building such a classifier is within the scope of this project, but my experiences may prove that this approach is worthy of consideration as I continue development.

Second, GestureDB provides means for just-in-time access to the underlying data in order to more efficiently design queries. For instance, simply preview gestures are available that allow the user to see the data they are querying against in order to modify their gestures before completing them.

Finally, the ability to undo an operation adds to GestureDB’s flexibility. While this has seemed to me to be a nice-to-have feature, I see it now as even more important. While some aspects of the interactions I am designing allow for implicit undo, at some point it will be necessary to explicitly undo any operation, as well as to undo many successive operations.

There are also numerous ways in which GestureDB seems to be successful that reinforce the design I am considering. Representation of ‘tables’ as real objects that the user can manipulate seems effective. In addition, separating the interaction space into a ‘well’ where tables are selected and a ‘sandbox’ where tables are dropped in order to be shaped into a portion of the query also seems to be effective.

As Nandi and Mandel state, precious few tools for graphical construction of database queries exist for touch interfaces. This leaves me in the exciting position of working in an area where little progress has yet been made, but at the same time having little in the way of the experiences of other researchers from which to draw. These examples of similar work that I have been able to find do, fortunately, provide helpful advice on common pitfalls that I might avoid, as well as reinforcement of not only the utility of such a tool, but the appropriateness of a number of design decisions that I have already made.

First Crit

A few days ago, we had our first crits of our initial design ideas for our course project NUIs. As I’ve previously described, I’m designing a NUI for the Perceptive Pixel for exploration of a MongoDB dataset.

The biggest question around my current design is how exactly collaboration is accomplished with the NUI. In reality, this question is about more than the NUI itself; really, the question is, how do multiple parties collaborate in data exploration? To begin to consider this question, it’s important to understand what the data are in my first target use case.

This use case considers data exploration with the Emotion in Motion dataset. Emotion in Motion is a large-scale experiment that measures subjects’ physiology while they listen to different selections of music. ‘Documents’ in the dataset come primarily in three flavors: trials, signals, and media. An abbreviated trial document looks something like the following:

Those entries in the media property correspond to three different songs to which this subject listened. The answers property contains both demographic information, as well as answers to questions that this subject was asked after listening to each song. For instance, after listening to the song with label ObjectId("537e601bdf872bb71e4df26d") (from the media property), the subject rated their ‘liking’ of the song as 4 on a scale of 1-5. The media ObjectIds point to media documents that look something like this (also abbreviated):

And finally, each media in a trial is associated with a signal document. Here’s an abbreviated example:

Each of the properties under the signals property is a very long array, with each entry representing the instantaneous value of a given signal as measured at a specific point in time while listening to the associated media file. The entries in eda_status and hr_status are binary indicators of the acceptability of the other EDA and HR signals at that moment in time. In addition, we work with a far greater number of featured that are derived from the raw and filtered physiological signals.

Looking at one of these combined media/signal/trials in any detail takes a considerable amount of screen space. The problem is, we are approaching 40,000 ‘song listens’, and this number continues to grow daily. Within the next two years, we expect to be well beyond 100,000 listens. So, for a given song, an interface to explore signals from the, say, 2,000 subject that have listened to the song needs to be carefully considered. And, how do we go about creating an interface with which multiple people can work together to explore such a dataset.

The most obvious way to visualize data like this is to create individual plots for each type of signal/feature (tonic EDA and heart rate variability, for instance). These plots are naturally aligned vertically, as they all correspond to a common timebase. How, though, do multiple people easily manipulate and view this visualization? I’ve imagined the scenario for this project to be one in which the Perceptive Pixel is used as a tabletop interface. Thus, the most obvious arrangement of users is on either side of the table. Is each user shown their own separate visualization/interface in the orientation that is correct for them? Is the separation of displays used only during the exploration process and later combined for a larger visualization? If the exploration is to be tightly linked (each party works closely together during the exploration), how is the interface oriented? Or, does a less tightly linked interaction better suit this scenario?

These are the kinds of questions that came up during my first crit. Many of them would be easily addressed by mounting the Perceptive Pixel vertically, and in the end, this may be the best solution. I’m still enjoying the challenge of exploring ways to create a collaborative NUI using a tabletop interface that deals with content that is highly sensitive to orientation, though.

Perceptive Pixel

I’ve had the opportunity to think a bit more about this NUI-based tool for MongoDB data exploration and visualization. In addition, I’ve been able to discuss the project with Doug Bowman. I’ve now have a bit more clarity about what I’d like to see from this interface, and what first steps I should take.

First, on Friday, Chris North introduced Virginia Tech’s new Microsoft Perceptive Pixel at the ICAT Community Playdate last Friday.

From Microsoft:

The Perceptive Pixel (PPI) by Microsoft 55″ Touch Device is a touch-sensitive computer monitor capable of detecting and processing a virtually unlimited number of simultaneous on-screen touches. It has 1920 x 1080 resolution, adjustable brightness of up to 400 nits, a contrast ratio of up to 1000:1, and a display area of 47.6 x 26.8 inches. An advanced sensor distinguishes true touch from proximal motions of palms and arms, eliminating mistriggering and false starts. With optical bonding, the PPI by Microsoft 55” Touch Device virtually eliminates parallax issues and exhibits superior brightness and contrast. And it has built-in color temperature settings to accommodate various environments and user preference.

Perceptive Pixel

While the unit is quite impressive, I’m most interested in how this interface might enable something truly unique for this project. Other than space around the unit, there’s no other limiting factor on the number of users who might view and interact with on-screen content. There is plenty of space for multiple users to carve out their own visualizations, as well. So, I’ll be working with the Perceptive Pixel, instead of the iPad. The learning curve will be steeper for me, as I’m already a competent iOS developer, but I think it will be worth the additional effort.

Second, I’m concerned about biting off more than I can chew in this project. Both data exploration and visualization (in particular, of the dataset with which I’m always working) are important for me to have. However, given the duration of the project, trying to get very deep into both might be too ambitious. Instead, I’ll be focusing on developing an interface for collaborative visualization of NoSQL data–data exploration can come later. This will likely mean that the first number of iterations use only canned data from the dataset.

So, the first step is to jump into C#. I’m not particularly excited to work on a Microsoft stack, but if this is what working with the Perceptive Pixel requires, so be it. The next step is to begin to brainstorm design ideas–more to come on that this week.

NUI Project

One of our ongoing studies is Emotion in Motion, a large-scale experiment that collects physiological data from people while they listen to selections of music. Emotion in Motion began in 2010, while we were working as Ph.D. researchers at Queen’s University Belfast. It first ran for several month in the Science Gallery in Dublin, Ireland. Here, we went through several iterations of the experiment: questions we asked the participants changed, the music selections changed, and so on. Since Dublin, Emotion in Motion has been staged in New York City, Bergen (Norway), Manila, and the Philippines. We are currently preparing to deploy Emotion in Motion in Taiwan for the entirety of 2015.

The data generated by Emotion in Motion were originally written to formatted text files. We wrote parsers for these files to work in the environments in which we chose to work. As Emotion in Motion’s life has continued, however, we’ve recognized that we really need a better method for storing and accessing these data. Across all of these iterations, while we’ve made a number of changes to the content of the experiment, the overall structure of the experiment has remained relatively stable: participants are always watching or listening to some form of media; we are recording their physiology, and asking them questions about their experiences. We decided that a NoSQL database would allow us to store huge numbers of data entities that in some aspects share many common similarities, but in others may vary wildly. For instance, while we record the same physiological signals from all participants during each media session, the lengths of all media selections are not the same. Or, while we ask for the same demographic information from all participants, we may ask different questions in response to each media selection. The difficulty of representing these varying schemas of data into an RDBMS’ tables made a NoSQL solution the obvious alternative.

So, I now find myself doing a great deal of work in MongoDB. The learning curve has been surprisingly gentle, and I’m very comfortable querying around through the scripting interface. One thing that I have found myself wanting as an easy means of quick-and-dirty visualization for data exploration and high-level analysis means, though. Currently, my workflow is to refine queries using the scripting interface, pull the data I need from MongoDB, and then use an external tool (MATLAB, R, etc.) to visualize the data. It would be very useful for me to be able to be able to visualize queries on the fly, instead of hopping through this piecemeal workflow. In addition, the modularity of MongoDB queries and aggregation would lend themselves well to construction and refinement through a graphical interface.

It’s this real, personal need for such a tool that has led me to choose building such a tool using a tablet interface for a semester-long project in Doug Bowman’s class on natural user interfaces. Some of the other ideas with which I was toying were:

  • Tabletop audio editing tool
  • Gestural music improvisation tool
  • Live music performance looping tool
  • Gestural musical score following tool

The musician in me would love to to build any of those tools. Certainly, it would make the project more enjoyable and motivating for me. The researcher in me (that just needs to finish this ****ing dissertation), needs what I’ve described in order to do his work. Practicality and necessity beats out fun and exciting in this case. I’ll post more as the project progresses.