Other Similar Database-Related Interfaces

Here are a few thoughts on a couple of database-related interfaces (two of them also touch-based) that are similar to the query building NUI that I’m currently building. I’ll look specifically at areas where my tool can improve, and problems with which these other projects have dealt of which I should be aware as I continue to work on this project.

First, Stefan Flöring and Tobias Hesselmann created TaP,

a visual analytics system for visualization and gesture based exploration of multi-dimensional data on an interactive tabletop…[U]sers are able to control the entire analysis process by means of hand gestures on the tabletop’s surface.

This doesn’t seem to be entirely true, as many tasks are accomplished through half-circle radial menus called stacked half-pie menus. I’ll also note here that while the authors claim this to be a collaborative interface, all collaborators must be grouped around a single edge of the tabletop. It seems that neither were Flöring and Hesselmann able to address the problem of coherent representation of a orientation sensitive entities to multiple users at different positions. They do acknowledge this, but give no advice on how to address this problem.

While TaP isn’t a query construction tool, there are several issues that Flöring and Hesselmann have addressed from which it may be useful for me to learn. While their half-pie menus may not make sense as direct replacements for the menus I have currently designed, it is possible that their layered approach to the radial menu may be useful. I also like the ability to call the menu forth from any location on the screen with the heel of the palm. TaP’s dropzones are also in line with my thinking, and seem to be intuitive from watching the video.

The only real gestures beyond the obvious ones for moving and scaling objects are the tracing of rectangles to create new charts, and the tracing of circles to open the help menu. These seem contrived to me; they seem like gestures created just for the hell of it. For better or worse, this reinforces my aversion for designing gestural interactions in my tool unless they seem specifically useful or called for.

GestureDB is a tool very similar to the tool I am currently designing. The designers of GestureDB describe it as

a novel gesture recognition system that uses both the interaction and the state of the database to classify gestural input into relational database queries.

The primary difference between GestureDB and the tool I am developing is that GestureDB addresses problems in designing queries against relational databases. My tool, on the other hand, targets NoSQL (non-relational) databases. While there are similar problems in designing queries for both relational and NoSQL databases, building queries for NoSQL databases does present its own unique set of challenges. Nevertheless, there are a number of things to learn from the experiences of the designers of GestureDB.

First, simple gesture recognition may not satisfactorily describe the range of intent of a user when designing a query. To address this issue, the designers of GestureDB use an entropy-based classifier that draws on two sources of features. As usual, it narrows the set of potential gestures based on spatial information contained in the gesture. Second, it prunes the space of possible user intent by examining which actions represented by gestures are more likely than others given the constraints imposed by the underlying database structure. Based on this, the classifier automatically detects the most likely intent based on all possible intents. It may not be the case that building such a classifier is within the scope of this project, but my experiences may prove that this approach is worthy of consideration as I continue development.

Second, GestureDB provides means for just-in-time access to the underlying data in order to more efficiently design queries. For instance, simply preview gestures are available that allow the user to see the data they are querying against in order to modify their gestures before completing them.

Finally, the ability to undo an operation adds to GestureDB’s flexibility. While this has seemed to me to be a nice-to-have feature, I see it now as even more important. While some aspects of the interactions I am designing allow for implicit undo, at some point it will be necessary to explicitly undo any operation, as well as to undo many successive operations.

There are also numerous ways in which GestureDB seems to be successful that reinforce the design I am considering. Representation of ‘tables’ as real objects that the user can manipulate seems effective. In addition, separating the interaction space into a ‘well’ where tables are selected and a ‘sandbox’ where tables are dropped in order to be shaped into a portion of the query also seems to be effective.

As Nandi and Mandel state, precious few tools for graphical construction of database queries exist for touch interfaces. This leaves me in the exciting position of working in an area where little progress has yet been made, but at the same time having little in the way of the experiences of other researchers from which to draw. These examples of similar work that I have been able to find do, fortunately, provide helpful advice on common pitfalls that I might avoid, as well as reinforcement of not only the utility of such a tool, but the appropriateness of a number of design decisions that I have already made.

First Crit

A few days ago, we had our first crits of our initial design ideas for our course project NUIs. As I’ve previously described, I’m designing a NUI for the Perceptive Pixel for exploration of a MongoDB dataset.

The biggest question around my current design is how exactly collaboration is accomplished with the NUI. In reality, this question is about more than the NUI itself; really, the question is, how do multiple parties collaborate in data exploration? To begin to consider this question, it’s important to understand what the data are in my first target use case.

This use case considers data exploration with the Emotion in Motion dataset. Emotion in Motion is a large-scale experiment that measures subjects’ physiology while they listen to different selections of music. ‘Documents’ in the dataset come primarily in three flavors: trials, signals, and media. An abbreviated trial document looks something like the following:

Those entries in the media property correspond to three different songs to which this subject listened. The answers property contains both demographic information, as well as answers to questions that this subject was asked after listening to each song. For instance, after listening to the song with label ObjectId("537e601bdf872bb71e4df26d") (from the media property), the subject rated their ‘liking’ of the song as 4 on a scale of 1-5. The media ObjectIds point to media documents that look something like this (also abbreviated):

And finally, each media in a trial is associated with a signal document. Here’s an abbreviated example:

Each of the properties under the signals property is a very long array, with each entry representing the instantaneous value of a given signal as measured at a specific point in time while listening to the associated media file. The entries in eda_status and hr_status are binary indicators of the acceptability of the other EDA and HR signals at that moment in time. In addition, we work with a far greater number of featured that are derived from the raw and filtered physiological signals.

Looking at one of these combined media/signal/trials in any detail takes a considerable amount of screen space. The problem is, we are approaching 40,000 ‘song listens’, and this number continues to grow daily. Within the next two years, we expect to be well beyond 100,000 listens. So, for a given song, an interface to explore signals from the, say, 2,000 subject that have listened to the song needs to be carefully considered. And, how do we go about creating an interface with which multiple people can work together to explore such a dataset.

The most obvious way to visualize data like this is to create individual plots for each type of signal/feature (tonic EDA and heart rate variability, for instance). These plots are naturally aligned vertically, as they all correspond to a common timebase. How, though, do multiple people easily manipulate and view this visualization? I’ve imagined the scenario for this project to be one in which the Perceptive Pixel is used as a tabletop interface. Thus, the most obvious arrangement of users is on either side of the table. Is each user shown their own separate visualization/interface in the orientation that is correct for them? Is the separation of displays used only during the exploration process and later combined for a larger visualization? If the exploration is to be tightly linked (each party works closely together during the exploration), how is the interface oriented? Or, does a less tightly linked interaction better suit this scenario?

These are the kinds of questions that came up during my first crit. Many of them would be easily addressed by mounting the Perceptive Pixel vertically, and in the end, this may be the best solution. I’m still enjoying the challenge of exploring ways to create a collaborative NUI using a tabletop interface that deals with content that is highly sensitive to orientation, though.

Perceptive Pixel

I’ve had the opportunity to think a bit more about this NUI-based tool for MongoDB data exploration and visualization. In addition, I’ve been able to discuss the project with Doug Bowman. I’ve now have a bit more clarity about what I’d like to see from this interface, and what first steps I should take.

First, on Friday, Chris North introduced Virginia Tech’s new Microsoft Perceptive Pixel at the ICAT Community Playdate last Friday.

From Microsoft:

The Perceptive Pixel (PPI) by Microsoft 55″ Touch Device is a touch-sensitive computer monitor capable of detecting and processing a virtually unlimited number of simultaneous on-screen touches. It has 1920 x 1080 resolution, adjustable brightness of up to 400 nits, a contrast ratio of up to 1000:1, and a display area of 47.6 x 26.8 inches. An advanced sensor distinguishes true touch from proximal motions of palms and arms, eliminating mistriggering and false starts. With optical bonding, the PPI by Microsoft 55” Touch Device virtually eliminates parallax issues and exhibits superior brightness and contrast. And it has built-in color temperature settings to accommodate various environments and user preference.

Perceptive Pixel

While the unit is quite impressive, I’m most interested in how this interface might enable something truly unique for this project. Other than space around the unit, there’s no other limiting factor on the number of users who might view and interact with on-screen content. There is plenty of space for multiple users to carve out their own visualizations, as well. So, I’ll be working with the Perceptive Pixel, instead of the iPad. The learning curve will be steeper for me, as I’m already a competent iOS developer, but I think it will be worth the additional effort.

Second, I’m concerned about biting off more than I can chew in this project. Both data exploration and visualization (in particular, of the dataset with which I’m always working) are important for me to have. However, given the duration of the project, trying to get very deep into both might be too ambitious. Instead, I’ll be focusing on developing an interface for collaborative visualization of NoSQL data–data exploration can come later. This will likely mean that the first number of iterations use only canned data from the dataset.

So, the first step is to jump into C#. I’m not particularly excited to work on a Microsoft stack, but if this is what working with the Perceptive Pixel requires, so be it. The next step is to begin to brainstorm design ideas–more to come on that this week.

NUI Project

One of our ongoing studies is Emotion in Motion, a large-scale experiment that collects physiological data from people while they listen to selections of music. Emotion in Motion began in 2010, while we were working as Ph.D. researchers at Queen’s University Belfast. It first ran for several month in the Science Gallery in Dublin, Ireland. Here, we went through several iterations of the experiment: questions we asked the participants changed, the music selections changed, and so on. Since Dublin, Emotion in Motion has been staged in New York City, Bergen (Norway), Manila, and the Philippines. We are currently preparing to deploy Emotion in Motion in Taiwan for the entirety of 2015.

The data generated by Emotion in Motion were originally written to formatted text files. We wrote parsers for these files to work in the environments in which we chose to work. As Emotion in Motion’s life has continued, however, we’ve recognized that we really need a better method for storing and accessing these data. Across all of these iterations, while we’ve made a number of changes to the content of the experiment, the overall structure of the experiment has remained relatively stable: participants are always watching or listening to some form of media; we are recording their physiology, and asking them questions about their experiences. We decided that a NoSQL database would allow us to store huge numbers of data entities that in some aspects share many common similarities, but in others may vary wildly. For instance, while we record the same physiological signals from all participants during each media session, the lengths of all media selections are not the same. Or, while we ask for the same demographic information from all participants, we may ask different questions in response to each media selection. The difficulty of representing these varying schemas of data into an RDBMS’ tables made a NoSQL solution the obvious alternative.

So, I now find myself doing a great deal of work in MongoDB. The learning curve has been surprisingly gentle, and I’m very comfortable querying around through the scripting interface. One thing that I have found myself wanting as an easy means of quick-and-dirty visualization for data exploration and high-level analysis means, though. Currently, my workflow is to refine queries using the scripting interface, pull the data I need from MongoDB, and then use an external tool (MATLAB, R, etc.) to visualize the data. It would be very useful for me to be able to be able to visualize queries on the fly, instead of hopping through this piecemeal workflow. In addition, the modularity of MongoDB queries and aggregation would lend themselves well to construction and refinement through a graphical interface.

It’s this real, personal need for such a tool that has led me to choose building such a tool using a tablet interface for a semester-long project in Doug Bowman’s class on natural user interfaces. Some of the other ideas with which I was toying were:

  • Tabletop audio editing tool
  • Gestural music improvisation tool
  • Live music performance looping tool
  • Gestural musical score following tool

The musician in me would love to to build any of those tools. Certainly, it would make the project more enjoyable and motivating for me. The researcher in me (that just needs to finish this ****ing dissertation), needs what I’ve described in order to do his work. Practicality and necessity beats out fun and exciting in this case. I’ll post more as the project progresses.

NUIs in Everyday Computing

In a recent post on the Leap Motion blog, Alex Colgan discusses the influences that fictional user interfaces (read ‘user interfaces depicted in movies’) have on the development of motion controls being developed today. He draws on examples from Minority Report, Ender’s Game, and The Avengers to illustrate his three main points. In short, these are:

  • Successful motion controls ‘make us feel powerful and in control of our environment’.
  • Successful motion controls keep the user in a state of flow.
  • Successful motion controls leverage immersion and ‘anti-immersion’ well.

I’d like to focus on the second of those points. In his post, Colgan references Mihaly Csikszentmihalyi’s description of flow (the psychologist who initially proposed the notion):

Human beings seek optimal experiences, where we feel a sense of exhilaration–a deep sense of enjoyment. In these intense moments of concentration, our ego disappears, time stands still. Art. Sport. Gaming. Hacking. Every flow activity shares one thing in common: a sense of discovery, a creative feeling of transporting us into a new reality to higher levels of performance.

Many people who speak of flow (Colgan included) only discuss flow as occurring in creative activities, sports, gaming, and the like. Need this be the case? Is enabling a flow state really only a goal fit for user interfaces built for entertainment and gaming (as Wigdor and Wixon might have us believe?)

Csikszentmihalyi says no. To support this (drawing from his thousands of interviews with not only creatives and athletes, but also CEOs, shepherds, and the like) he describes seven indicators that one is in a flow state:

  1. Completely involved in what we are doing–focused, concentrated.
  2. A sense of ecstasy–of being outside everyday reality.
  3. Greater inner clarity–knowing what needs to be done, and how well we are doing.
  4. Knowing that the activity is doable–that our skills are adequate to the task.
  5. A sense of serenity–no worries about oneself, and a feeling of growing beyond the boundaries of the ego.
  6. Timelessness–thoroughly focused on the present, hours seem to pass by in minutes.
  7. Intrinsic motivation–whatever produces flow becomes its own reward.

While I don’t disagree that gaming and entertainment interfaces should aim to be conducive to flow, I’m convinced that flow has a place outside of the latest Call of Duty release. In my work, being completely involved in what I am doing, having inner clarity, having confidence in my abilities, finding serenity, being excited and motivated to do my work are all certainly desirable and achievable. Furthermore, I would hope that the tools I choose to do my work are conducive to these, as well. While NUIs, on the outside, may seem most appropriate for gaming and entertainment, no one has yet convinced me that these are the only applications where they are appropriate. And, if they are especially capable in enabling flow, we should be considering ways to incorporate them in all manner of UI.