Photos of Banff and environs from Thrasos Pappas
Lake Louise Photos from Nick Kingsbury
Photos from Ton Kalker
SUNDAY | |
Morning | Introduction and Welcome By BIRS Station Manager |
Eero Simoncelli Photographic Image Representation Multiscale Gradients Abstract | Talk Slides |
Ray Liu Multimedia Forensics for Traitors Tracing Abstract | Talk Slides |
Ton Kalker Secure Signal Processing Abstract | Talk Slides |
Deepa Kundur Emerging Paradigms in Sensor Network Security Abstract | Talk Slides |
Afternoon | Tsuhan Chen Taking Multi-View Imaging to a New Dimension: From Harry Nyquist to Image-Based Rendering Abstract |
Coffee Break | |
Nick Kingsbury Multi-scale Displacement Estimation and Registration for 2-D and 3-D datasets. Abstract | Talk Slides |
Adriana Dumitras Optimization Methods for State-of-the-Art Video Encoders Abstract | |
Melanie Louisa West Using Multimedia and Hip-hop Culture to Promote Math Among Under-represented Minorities Abstract | |
Program Website: http://www.mindrap.org | |
MONDAY | |
Morning | Amir Said The Need for Better Models for Coding Sparse Multimedia Representations Abstract | Talk Slides |
Russ Mersereau Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Abstract | Talk Slides |
Sheila Hemami A Signal Processer's Approach to Modeling the Human Visual System, and Applications Abstract | |
Afternoon | Li Deng Computer Speech Recognition: Building Mathematical Models Mimicking the Human System Abstract | Talk Slides |
Mari Ostendorf Managing Spoken Documents Abstract | |
George Tzanetakis A personal history of Music Information Retrieval Abstract | |
Jose Moura Determinitic and Stochastic, Time and Space Signal Models: An Algebraic Approach Abstract | Talk Slides |
For copies of papers and reprints cited: http://www.ece.cmu.edu/~moura and follow links to journal and conference papers. | |
For further information on the algebraic approach to signal processing,: http://www.ece.cmu.edu/~smart | |
For information on a tool to generate automatically SW implementations of linear transforkms making use of fast algorithms, some derived using the algebraic theory: http://www.spiral.net | |
Panos Nasiopoulos and Kostas Plataniotis
Digital Video for Mobile Devices and A Unified Framework for the Consumer-Grade Image Pipeline Abstract | Talk Slides |
TUESDAY | |
Morning | Philip Chou Network Coding for the Internet and Wireless Networks Abstract | Talk Slides |
Michele Effros Information Representation for Network Systems Abstract | |
Shahram Shirani Analytical modeling of matching pursuit Abstract | Talk Slides |
WEDNESDAY | |
Morning | Thrassos Pappas Mathematical and Perceptual Models for Image Segmentation Abstract | Talk Slides |
Ling Guan A New Framework for Modeling and Recognizing Human Movement and Actions Abstract | |
Jie Liang Time Domain Lapped Transform and Its Applications Abstract | Talk Slides |
Afternoon | Alfred Hero Dimension reduction for classification Abstract | Talk Slides |
Ghassan Hamarneh Deformable Models for Image Analysis; From 'Snakes' to 'Organisms' Abstract | Talk Slides |
Discussion |
Photographic Image Representation with Multiscale Gradients
Abstract: I'll describe our recent empirical investigation and modeling of the joint statistical properties of a multiscale representation based on derivative operators. In particular, I'll describe the use of Gaussian Scale Mixtures (produce of a scalar random variable and a Gaussian vector) to model the statistics of clusters of wavelet coefficients at adjacent positions, scales and orientations. When applied to the problem of denoising, these models provide a natural generalization of both standard linear (Wiener) and thresholding estimators, and lead to substantial increases in performance. I'll also describe recent work on extending this model to include local geometry in the form of phase and orientation information.
Multimedia Forensics for Traitors Tracing
The recent growth of networked multimedia systems has increased the need for techniques that protect the digital rights of multimedia. Traditional protection alone (such as encryption, authentication and time stamping) is not sufficient for protecting data after it is delivered to an authorized user or after it has traveled outside a closed system. To address the post-delivery protection and introduce user accountability, a class of technologies known as digital fingerprinting is emerging. Due to the global nature of Internet, ensuring the appropriate use of media content, however, is no longer a traditional security issue with a single threat or adversary. Rather, new threats are posed by coalitions of users who can combine their contents to undermine the fingerprints. These attacks, known as collusion attacks, provide a cost-effective method for removing an identifying fingerprint, and thus pose a strong threat to protecting the digital rights of multimedia. To mitigate the serious threat posed by collusion, theories and algorithms are being investigated and developed for constructing forensic fingerprints that can resist collusion, identify colluders, and corroborate their guilt. Therefore multimedia forensics has become an emerging field built upon the synergies between signal processing theory, cryptology, coding theory, communication theory, information theory, game theory, and the psychology of human visual/auditory perception. This talk will provide audience with a broad overview of the recent advances in multimedia forensics with a focus on multimedia fingerprinting for traitor tracing. Tracing traitors using collusion-resistant fingerprinting for multimedia that jointly considers the encoding, embedding, and detection of fingerprints will be presented. A general formulation of fingerprint coding and modulation with a unified framework covering orthogonal fingerprints, coded fingerprints, and group fingerprints will be discussed. Finally traitor-within-traitor dynamics and behavior will be modeled and analyzed. As a result, we can develop optimal strategies for traitors and for detectors.
Secure Signal Processing
We observe that (professional) multimedia signals are increasingly made available only in protected format. Typically the security wrappers can only be removed by the targeted devices or applications (e.g. the DRM agent in a rendering device). This poses serious problems for intemediate processing applications that do no have access to the appropriate cryptographic keys (for liability reasons, security reasons or otherwise) and/or that do not have sufficient computational resources. In this talk we discuss options for processing of protected signals in their protected format, both by adopting the cryptographic methods (e.g. homomorphic encryption) or by adapting the signal processing methods (scalable coding).
Emerging Paradigms in Sensor Network Security
This talk provides an overview of the field of sensor network security and highlights particular challenges in symmetric key distribution, secure aggregation, secure routing, and actuation security. Through examination of these problems, fundamental compromises among the degree of protection, complexity and network performance are highlighted leading to a discussion of appropriate primitives and paradigms for securing sensor networks. The talk concludes with a discussion of the principal issues for protecting emerging optical free space sensor networks and multimedia sensor networks.
Taking Multi-View Imaging to a New Dimension: From Harry Nyquist to Image-Based Rendering
A picture is worth a thousand words. However, a single picture is not able to render the whole scene at all; it merely renders the scene as seen from a particular viewpoint. In 1991, Adelson and Bergen proposed the concept of the plenoptic function, a seven-dimensional function that represents all the light rays in a dynamic scene. Since then, research on sampling, storing, interpolating, and reconstructing the plenoptic function, has been emerging at both academic and industrial research institutions. This area of research is commonly referred to as image-based rendering, or more familiar to the signal processing community, multi-view image processing.
Recent convergence of image processing, computer vision, and computer graphics has resulted in significant progress in multi-view image processing. Now widely used in applications ranging from special effects (Remember the movie "The Matrix"?) to virtual teleconferencing, multi-view image processing has become a critical tool for creating visually exciting content. With multi-view image processing, real-world scenes can be captured and rendered directly from images captured by cameras, eliminating the need for computationally expensive modeling of 3D geometry or surface reflectance, as is often done in traditional computer graphics.
In this talk we will outline recent developments in image-based rendering. While studying the mechanism for sampling multi-view data, we will reveal the connections between image-based rendering, multidimensional multirate signal processing, and the Sampling Theorem discovered by Harry Nyquist 80 years ago!
Multi-scale Displacement Estimation and Registration for 2-D and 3-D datasets.
This talk will consider the problems of displacement (or motion) estimation between pairs of 2-D images or 3-D datasets, especially for the case of non-rigid deformation as encountered in many medical imaging applications. We will show how the use of multi-scale directionally selective octave-band filters with analytic (complex) impulse responses can greatly reduce the computational load associated with displacement estimation by employing phase-based methods. In particular we will extend the techniques of Hemmendorf for use with dual-tree complex wavelets (DT CWT) and in an iterative scenario, such that the usual approximations associated with phase-based approaches are minimised. These methods rely on the shift-invariant and directional properties of the DT CWT, and are inherently resilient to shifts in the mean level and contrast of the two datasets and to noise because of the band-limited nature of the signals and the use of phase shifts to estimate displacements. They are computationally efficient because a coarse-to- fine multi-scale approach is used and are well-suited to displacement fields that can be represented by locally-affine models with smoothly varying parameters. The algorithm can also be designed largely to ignore data in areas where the two datasets do not match (e.g. where a tumour is present in one dataset but not in the other). We believe that the computational advantages of our methods will be particularly helpful for 3-D registration tasks.
Optimization Methods for State-of-the-Art Video Encoders
Numerous efforts have been focused toward identifying the best methods to optimize video encoders. These efforts focused on removing spatial, temporal and perceptual redundancies from a video source with the objective of representing the data efficiently. However, so far there is no unique "best method" to optimize a video encoder. Instead, there exist various methods that address (usually distinctly) different aspects of the optimization problem and different applications. This diversity is motivated and enabled by the tremendous flexibility allowed in the encoder design by video coding standards, the development of unoptimized video encoding tools as part of the non-normative verification or experimental models in the standards' developments, and the powerful competition in the video industry. This talk will present a taxonomy and an overview of the methods that enable video encoder optimization by tradeoffs at the algorithmic, software and hardware implementation levels.
MindRap: Using Multimedia and Hip-hop Culture to Promote Math Among Under-represented Minorities
There is widespread agreement among educators that there is a strong need for programs that will increase math and science competency among under-represented minority students. Lack of interest and motivation are known contributing factors for that lack of representation. We propose to combine multimedia with elements of the hip-hop culture to promote interest in math among under-represented minorities.
In today's society, hip-hop music has captured the minds of urban youth. Music sales, fashion trends, and advertisement strategies reflect this. Consequently, we believe that incorporating hip-hop into math instruction for under-represented minorities holds great promise for success. The elements of rhyme, rhythm, and repetition make rap -- hip-hop's linguistic component -- an excellent creative vehicle for presenting concepts that require memorization. Math, in particular, lends itself to rap because the creative use of natural language provides a platform for transferring the conceptualization of math into real life experiences through story-telling.
By combining the learning experience with an activity that is already an integral part of a person's life, we believe that we will not only increase interest in learning, but will also maximize information retention. This coupled with the incorporation of multimedia elements that will be widely accessible (on public display for peers and or publish for a general audience) will motivate the individual (or group) to do their very best.
An innovative aspect of the proposed approach is that it combines teaching students at the elementary school level with multimedia content created by students at the high school level. This accomplishes two goals, it makes it easier to motivate the younger students, and at the same time, provides a great vehicle for exposing the older students to multimedia.
The Need for Better Models for Coding Sparse Multimedia Representations
A main objective in multimedia signal processing is to numerically eliminate redundancy and create sparse representations. However, for compression an effective representation needs to be effectively entropy coded, and we find a need to have good combined models for both the signal and how its information is distributed, in the sense of what and where the most important components are. Simple recursive set-partitioning methods were shown to be very effective to code sparse data, both in terms of compression and computational complexity, but their use still had not be extended to more complicated media types. In this talk we discuss the challenges and possibilities for improving performance using more sophisticated data models.
"Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources"
It is well known that vectors derived from consecutive segments of most real-world signals are strongly correlated. This intervector correlation is not exploited in a standard VQ system. Many techniques proposed to exploit this correlation render the VQ suboptimal or require buffering and thus introduce encoding delay. This talk will present two alternative methods. The first approach, cache VQ, uses a cache memory to reduce the bit rate and the encoding time, at the cost of a slight, but controllable, increase in the coding error. The second approach, recently developed by Krishnan, Barnwell, and Anderson at Georgia Tech, overcomes cache VQ's limitations. Their approach, called dynamic codebook reordering, dramatically reduces the entropy in the representation of the VQ symbols, which can then be exploited for lossless compression. Dynamic codebook reordering can significantly reduce the bit rate for strongly correlated sources without introducing any additional distortion, coding delay, or sub-optimality when compared to a standard VQ system. Examples illustrating the efficiencies of these two techniques will be presented for both speech and video signals.
A Signal Processer's Approach to Modeling the Human Visual System, and Applications
Current image and video compression algorithms (e.g., JPEG-2000, H.264) provide very high efficiency compression and excellent quality at relatively high bit rates. These algorithms operate by treating images and video as traditional "signals," employing efficient transformations, correlation-based models, and entropy coding. Human visual system characteristics have been successfully applied to high-rate signal-based compression, where stimuli such as compression-induced distortions are below the visibility threshold; i.e., humans cannot see them. Operation of such signal-based compression algorithms at low rates, in which compression-induced distortions are clearly visible, has to date operated based on visual system rules-of-thumb and has produced moderate success for images, while little has been done for video.
In this talk, I will present our recent results on characterizing the human visual system in a manner which allows for immediate incorporation into imaging and video applications such as compression and quality measurement at not only high rates/low distortions but also at low rates/high distortions. Results will be presented in two distinct areas: vision-based results explain how humans perceive stimuli, while engineering-motivated results allow us to incorporate our characterizations into practical algorithms.
Computer Speech Recognition: Building Mathematical Models Mimicking the Human System
The main goal of computer speech recognition/understanding is to automatically convert naturally uttered human speech into its corresponding text (and then into its meaning). While amazing success, both technologically and commercially, has been achieved in the past by straightforward mathematical methods (e.g., hidden Markov modeling, maximum likelihood and discriminative learning, dynamic programming, etc.), solutions of the remaining problems leading to its ultimate success appear to require a deep understanding of human speech recognition mechanisms. This talk will analyze various human sub-systems, including linguistic-concept generator, motor-control, articulation, vocal tract acoustic propagation, ears, auditory pathways, and auditory cortex, working in synergy to accomplish the remarkable task of highly robust, low-error speech recognition/perception and understanding. How to abstract the essence of such human information processing power in building a computer system with similar (or better) performance? How can we build mathematical models to enable the development of advanced machine-learning algorithms and techniques that will run efficiently in a computer? Is it possible to explore and exploit some special power of the computing machines inherently lacking in the human system so as to achieve super-human speech recognition? These are some of the issues to be addressed in this talk.
Managing Spoken Documents
As storage costs drop and bandwidth increases, there has been a rapid growth of information available via the web or in online archives, raising problems of finding and interpreting collections of documents. Significant recent progress has been made in text retrieval, analysis, summarization and translation, but much of this work has focused on written language. Increasingly, speech and video signals are also available -- including TV and radio broadcasts, congressional records, oral histories, voicemail, call center recordings, etc. -- which can be thought of as ``spoken documents''. Because it takes longer to listen to audio than to read text, spoken documents are clearly a prime candidate for automatic indexing, information extraction, and other such technologies. In this talk, we overview speech processing technology that underlies spoken document management, including mathematical frameworks for both word and metadata recognition, and for integrating video and language cues. In addition, we discuss issues that arise in text processing when moving from written to spoken language and implications for statistical models of language.
A personal history of Music Information Retrieval
Music Information Retrieval (MIR) is an emerging research area that explores how large digital collections of music can be effectively analyzed for searching and browsing. It combines ideas from many different fields including Signal Processing, Machine Learning, Music Cognition, and Human-Computer Interaction. In this talk, I will give an historic overview of MIR with specific emphasis on topics I have more personal experience such as: audio feature extraction, automatic musical genre classification, rhythm analysis, query-by-humming, and sensor-enhanced musical instruments. I will conclude the talk by making predictions about the future of MIR and how it will transform radically the way music is produced, distributed and consumed.
Deterministic and Stochastic, Time and Space Signal Models: An Algebraic Approach
We are all familiar with (infinite) "time" signal processing: time shifts, filters and convolution, signals, Fourier and z-transforms, spectrum, fast algorithms. Images, of course, are not "time" but "space" objects. Also, they are "finite" objects, i.e., defined over a finite indexing set. What is the natural concept of space shift, of space filter and convolution, spectral analysis, or "z"-transform, as well as many other related concepts? To address these questions, we go beyond linear algebra to present an algebraic approach where time (signal) and space (image) processing are instantiations of the same mathematical structure. The basic building block is the signal model - a triplet (A, M, φ) of an algebra A of filters, a module M of signals, and a generalization of the z-transform as a bijective linear mapping φ from a vector space into the module of signals. The shift is naturally interpreted as a generator of the algebra of filters, boundary conditions connect finite with infinite indexing sets, the trigonometric transforms (e.g., DCTs) are appropriate Fourier transforms, and the C-transform is the z-transform. More than a mathematical curiosity, the algebraic approach provides the appropriate structure to extend signal and image processing beyond uniform to other grids (e.g., hexagonal or quincunx), or develop fast algorithms from a few basic principles, from which we can also derive new fast algorithms for existing and new transforms. Connections with other image models, in particular, with Gauss Markov fields, and pinned Markov diffusions will be discussed. This talk overviews recent work with Markus Pueschel on the algebraic theory of signal and image processing.
[1] Pueschel and Moura, "Algebraic Theory of Signal Processing," submitted.
[2] Pueschel and Moura, "The Algebraic Approach to the DCT and DST and their Fast Algorithms," SIAM Journal of Computing, 35: (5), 1280-1316, Mar 2003.
Panos Nasiopoulos and Kostas Planiotis
Digital Video for Mobile Devices
Mobile wireless technologies and digital video broadcast technologies are gradually converging within efforts from 3GPP and DVB 2.0 to complete this merging in the upcoming generations of mobile technologies. In order to support this convergence, existing video technologies have to be upgraded to ensure the reliability and quality of the delivered content. This calls for highly efficient video codecs in addition to reliable error resilience techniques that overcome the bandwidth constraints and highly error prone conditions of wireless networks.
A Unified Framework for the Consumer-Grade Image Pipeline
In this talk a new modeling and processing approach suitable for consumer-grade image processing will be presented.
Using vector modeling principles, nonlinear image operators and adaptive filtering concepts, single-sensor camera image processing problems are treated from a global viewpoint yielding new classes of processing solutions.
Some of the varied applications of the framework will be covered; namely spectral interpolation (demosaicking), spatial interpolation of the acquired (mosaic-like) single-sensor gray-scale images as well as demosaicked full-color images, demosaicked image post-processing and color image enhancement, camera image denoising and sharpening, camera image compression, spatio-temporal video demosaicking, and camera image indexing and rights management.
Results obtained using the framework will be provided. The list of the topics to be covered, while certainly not exhaustive, provides a good indication of the usefulness and often necessity of our framework in consumer grade image processing.
Open research problems and other potential applications of the framework will also be discussed.
Information Representation for Network Systems
Analytical modeling of matching pursuit
Matching pursuit is a greedy algorithm that decomposes a signal into a redundant dictionary of basis functions. It has recently found applications in many areas including image and video processing. In this talk an analytical model is proposed to model the operation of matching pursuit algorithm on uniformly distributed signals. The model expresses the relationship between the bit rate and matching pursuit coder parameters such as dictionary size, quantization step size, distortion and dimension of the signal. This relationship can be used to optimize the dictionary size and quantization step size for minimum bit rate. The model is verified through experimental results and the accuracy of our model is validated for different system parameters.
Mathematical and Perceptual Models for Image Segmentation
We consider the segmentation of images of natural scenes. One of the challenges of this problem is that the statistical characteristics of perceptually uniform regions are spatially-varying due to effects of lighting, perspective, scale changes, etc. A second challenge is the extraction of perceptually relevant information. First, we consider the problem of segmenting images of objects with smooth surfaces. The images are modeled as smooth spatially-varying functions with sharp discontinuities at the segment boundaries, plus white Gaussian noise. We discuss an adaptive clustering algorithm for segmentation. It a generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image. The spatial constraints are modeled through the use of Gibbs/Markov random fields, while the local intensity variations are accounted for in an iterative procedure involving averaging over a sliding window whose size decreases as the algorithm progresses. We also consider a hierarchical implementation that results in better performance and computational efficiency. We then discuss an adaptive perceptual color-texture segmentation algorithm that is based on low-level features for color and texture. It combines knowledge of human perception with an understanding of signal characteristics in order to segment natural scenes into perceptually/semantically uniform regions. It is based on two types of spatially adaptive low-level features. The first describes the local color composition in terms of spatially adaptive dominant colors, and the second describes the spatial characteristics of the grayscale component of the texture. Key segmentation parameters are determined on the basis of subjective tests. The resulting segmentations convey semantic information that can be used for content-based retrieval.
A New Framework for Modeling and Recognizing Human Movement and Actions
Human-computer interaction (HCI) study is a key research area in many scientific disciplines. I will start the talk with an overview on concepts, history and recent development in HCI: face, speech, gesture, human emotion and human actions, with emphasis on emotion and action recognition. I will then focus on a fundamental, but under-investigated research area in HCI: modeling and recognizing human movement and actions. Inspired by the movement notation systems used in dance and the paradigm of the phonemes used in the continuous speech recognition, we developed a Continuous Human Movement Recognition (CHMR) framework. The framework is based a novel paradigm, the alphabet of dynemes ? the smallest contrastive dynamic units of human movement. A Differential Evolution-Monte Carlo particle filter is introduced, which has demonstrated highly effective and robust characteristics in tracking basic human movement skills. Using multiple hidden Markov models, the recognition process attempts to infer the human movement skill that could have produced the observed sequence of dynemes. Recent anthropometric data shows that the famous "average sized human" model in Leonardo da Vinci's drawing of the human figure is a fallacy, and that there is no one who is average in 10 dimensions. Incorporating the highly accurate biometric features into the CHMR framework, we have been able to demonstrate the effectiveness of the framework in biometrics, biomedical analysis, and recognition of human skills. We are convinced that this framework will potentially form the enabling technology for biometric authentication systems for a broad range of applications such as security/surveillance, biomedicine/physiotherapy, special effect in motion picture production, digital asset management, battlefield surveillance, coaching/training/judging in sports and performing arts, to name a few.
Time Domain Lapped Transform and Its Applications
In this talk, the theory and applications of time domain lapped transform will be reviewed, including the design of fast transform, its application in wavelet-based image and video coding, and error resilient design for multiple description coding.
Dimension reduction for classification
There has been intense interest in analysis of massively complex data sets with thousands of dimensions. Dimension reduction methods are critical components of any analysis method due to the requirements of computation and noise reduction. In this talk we will present new variational methods of dimension reduction that explicitly target classification, anomaly detection, or other tasks.
Deformable Models for Image Analysis; From 'Snakes' to 'Organisms'
I will start by giving a short overview on image segmentation and registration. I will then focus on Deformable models ('Snakes' and others) for image segmentation and mention issues related to incorporating prior knowledge. I will then present our work on 'deformable organisms', an artificial-life framework for image analysis, incorporating high-level, intelligent, intuitive control of shape deformations. Various application examples will be presented throughout the talk.