Pacific Institute for the Mathematical Sciences Mathematical Sciences Research Institute
with the participation of with the particpation of MITACS

05w5505 Multimedia and Mathematics
Programme

Back to 05w5505 home page

Photos of Monday Family Daytrip by Lolly Gray

Photos of Banff and environs from Thrasos Pappas

Lake Louise Photos from Nick Kingsbury

Photos from Ton Kalker

SUNDAY

Morning Introduction and Welcome By BIRS Station Manager
Eero Simoncelli
Photographic Image Representation Multiscale Gradients Abstract
Talk Slides
Ray Liu
Multimedia Forensics for Traitors Tracing Abstract
Talk Slides
Ton Kalker
Secure Signal Processing Abstract
Talk Slides
Deepa Kundur
Emerging Paradigms in Sensor Network Security Abstract
Talk Slides
Afternoon Tsuhan Chen
Taking Multi-View Imaging to a New Dimension: From Harry Nyquist to Image-Based Rendering Abstract
Coffee Break
Nick Kingsbury
Multi-scale Displacement Estimation and Registration for 2-D and 3-D datasets. Abstract
Talk Slides
Adriana Dumitras
Optimization Methods for State-of-the-Art Video Encoders Abstract
Melanie Louisa West
Using Multimedia and Hip-hop Culture to Promote Math Among Under-represented Minorities Abstract
Program Website: http://www.mindrap.org

MONDAY

Morning Amir Said
The Need for Better Models for Coding Sparse Multimedia Representations Abstract
Talk Slides
Russ Mersereau
Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources Abstract
Talk Slides
Sheila Hemami
A Signal Processer's Approach to Modeling the Human Visual System, and Applications Abstract
Afternoon Li Deng
Computer Speech Recognition: Building Mathematical Models Mimicking the Human System Abstract
Talk Slides
Mari Ostendorf
Managing Spoken Documents Abstract
George Tzanetakis
A personal history of Music Information Retrieval Abstract
Jose Moura
Determinitic and Stochastic, Time and Space Signal Models: An Algebraic Approach Abstract
Talk Slides
For copies of papers and reprints cited: http://www.ece.cmu.edu/~moura and follow links to journal and conference papers.
For further information on the algebraic approach to signal processing,: http://www.ece.cmu.edu/~smart
For information on a tool to generate automatically SW implementations of linear transforkms making use of fast algorithms, some derived using the algebraic theory: http://www.spiral.net
Panos Nasiopoulos and Kostas Plataniotis
Digital Video for Mobile Devices and A Unified Framework for the Consumer-Grade Image Pipeline Abstract
Talk Slides

TUESDAY

Morning Philip Chou
Network Coding for the Internet and Wireless Networks Abstract
Talk Slides
Michele Effros
Information Representation for Network Systems Abstract
Shahram Shirani
Analytical modeling of matching pursuit Abstract
Talk Slides

WEDNESDAY

MorningThrassos Pappas
Mathematical and Perceptual Models for Image Segmentation Abstract
Talk Slides
Ling Guan
A New Framework for Modeling and Recognizing Human Movement and Actions Abstract
Jie Liang
Time Domain Lapped Transform and Its Applications Abstract
Talk Slides
Afternoon Alfred Hero
Dimension reduction for classification Abstract
Talk Slides
Ghassan Hamarneh
Deformable Models for Image Analysis; From 'Snakes' to 'Organisms' Abstract
Talk Slides

Discussion

ABSTRACTS

Eero Simonelli

Photographic Image Representation with Multiscale Gradients

Abstract: I'll describe our recent empirical investigation and modeling of the joint statistical properties of a multiscale representation based on derivative operators. In particular, I'll describe the use of Gaussian Scale Mixtures (produce of a scalar random variable and a Gaussian vector) to model the statistics of clusters of wavelet coefficients at adjacent positions, scales and orientations. When applied to the problem of denoising, these models provide a natural generalization of both standard linear (Wiener) and thresholding estimators, and lead to substantial increases in performance. I'll also describe recent work on extending this model to include local geometry in the form of phase and orientation information.

Back to Schedule

Ray Liu

Multimedia Forensics for Traitors Tracing

The recent growth of networked multimedia systems has increased the need for techniques that protect the digital rights of multimedia. Traditional protection alone (such as encryption, authentication and time stamping) is not sufficient for protecting data after it is delivered to an authorized user or after it has traveled outside a closed system. To address the post-delivery protection and introduce user accountability, a class of technologies known as digital fingerprinting is emerging. Due to the global nature of Internet, ensuring the appropriate use of media content, however, is no longer a traditional security issue with a single threat or adversary. Rather, new threats are posed by coalitions of users who can combine their contents to undermine the fingerprints. These attacks, known as collusion attacks, provide a cost-effective method for removing an identifying fingerprint, and thus pose a strong threat to protecting the digital rights of multimedia. To mitigate the serious threat posed by collusion, theories and algorithms are being investigated and developed for constructing forensic fingerprints that can resist collusion, identify colluders, and corroborate their guilt. Therefore multimedia forensics has become an emerging field built upon the synergies between signal processing theory, cryptology, coding theory, communication theory, information theory, game theory, and the psychology of human visual/auditory perception. This talk will provide audience with a broad overview of the recent advances in multimedia forensics with a focus on multimedia fingerprinting for traitor tracing. Tracing traitors using collusion-resistant fingerprinting for multimedia that jointly considers the encoding, embedding, and detection of fingerprints will be presented. A general formulation of fingerprint coding and modulation with a unified framework covering orthogonal fingerprints, coded fingerprints, and group fingerprints will be discussed. Finally traitor-within-traitor dynamics and behavior will be modeled and analyzed. As a result, we can develop optimal strategies for traitors and for detectors.

Back to Schedule

Ton Kalker

Secure Signal Processing

We observe that (professional) multimedia signals are increasingly made available only in protected format. Typically the security wrappers can only be removed by the targeted devices or applications (e.g. the DRM agent in a rendering device). This poses serious problems for intemediate processing applications that do no have access to the appropriate cryptographic keys (for liability reasons, security reasons or otherwise) and/or that do not have sufficient computational resources. In this talk we discuss options for processing of protected signals in their protected format, both by adopting the cryptographic methods (e.g. homomorphic encryption) or by adapting the signal processing methods (scalable coding).

Back to Schedule

Deepa Kundur

Emerging Paradigms in Sensor Network Security

This talk provides an overview of the field of sensor network security and highlights particular challenges in symmetric key distribution, secure aggregation, secure routing, and actuation security. Through examination of these problems, fundamental compromises among the degree of protection, complexity and network performance are highlighted leading to a discussion of appropriate primitives and paradigms for securing sensor networks. The talk concludes with a discussion of the principal issues for protecting emerging optical free space sensor networks and multimedia sensor networks.

Back to Schedule

Hsuhan Chen

Taking Multi-View Imaging to a New Dimension: From Harry Nyquist to Image-Based Rendering

A picture is worth a thousand words. However, a single picture is not able to render the whole scene at all; it merely renders the scene as seen from a particular viewpoint. In 1991, Adelson and Bergen proposed the concept of the plenoptic function, a seven-dimensional function that represents all the light rays in a dynamic scene. Since then, research on sampling, storing, interpolating, and reconstructing the plenoptic function, has been emerging at both academic and industrial research institutions. This area of research is commonly referred to as image-based rendering, or more familiar to the signal processing community, multi-view image processing.

Recent convergence of image processing, computer vision, and computer graphics has resulted in significant progress in multi-view image processing. Now widely used in applications ranging from special effects (Remember the movie "The Matrix"?) to virtual teleconferencing, multi-view image processing has become a critical tool for creating visually exciting content. With multi-view image processing, real-world scenes can be captured and rendered directly from images captured by cameras, eliminating the need for computationally expensive modeling of 3D geometry or surface reflectance, as is often done in traditional computer graphics.

In this talk we will outline recent developments in image-based rendering. While studying the mechanism for sampling multi-view data, we will reveal the connections between image-based rendering, multidimensional multirate signal processing, and the Sampling Theorem discovered by Harry Nyquist 80 years ago!

Back to Schedule

Nick Kingsbury

Multi-scale Displacement Estimation and Registration for 2-D and 3-D datasets.

This talk will consider the problems of displacement (or motion) estimation between pairs of 2-D images or 3-D datasets, especially for the case of non-rigid deformation as encountered in many medical imaging applications. We will show how the use of multi-scale directionally selective octave-band filters with analytic (complex) impulse responses can greatly reduce the computational load associated with displacement estimation by employing phase-based methods. In particular we will extend the techniques of Hemmendorf for use with dual-tree complex wavelets (DT CWT) and in an iterative scenario, such that the usual approximations associated with phase-based approaches are minimised. These methods rely on the shift-invariant and directional properties of the DT CWT, and are inherently resilient to shifts in the mean level and contrast of the two datasets and to noise because of the band-limited nature of the signals and the use of phase shifts to estimate displacements. They are computationally efficient because a coarse-to- fine multi-scale approach is used and are well-suited to displacement fields that can be represented by locally-affine models with smoothly varying parameters. The algorithm can also be designed largely to ignore data in areas where the two datasets do not match (e.g. where a tumour is present in one dataset but not in the other). We believe that the computational advantages of our methods will be particularly helpful for 3-D registration tasks.

Back to Schedule

Adriana Dumitras

Optimization Methods for State-of-the-Art Video Encoders

Numerous efforts have been focused toward identifying the best methods to optimize video encoders. These efforts focused on removing spatial, temporal and perceptual redundancies from a video source with the objective of representing the data efficiently. However, so far there is no unique "best method" to optimize a video encoder. Instead, there exist various methods that address (usually distinctly) different aspects of the optimization problem and different applications. This diversity is motivated and enabled by the tremendous flexibility allowed in the encoder design by video coding standards, the development of unoptimized video encoding tools as part of the non-normative verification or experimental models in the standards' developments, and the powerful competition in the video industry. This talk will present a taxonomy and an overview of the methods that enable video encoder optimization by tradeoffs at the algorithmic, software and hardware implementation levels.

Back to Schedule

Melanie Louisa West

MindRap: Using Multimedia and Hip-hop Culture to Promote Math Among Under-represented Minorities

There is widespread agreement among educators that there is a strong need for programs that will increase math and science competency among under-represented minority students. Lack of interest and motivation are known contributing factors for that lack of representation. We propose to combine multimedia with elements of the hip-hop culture to promote interest in math among under-represented minorities.

In today's society, hip-hop music has captured the minds of urban youth. Music sales, fashion trends, and advertisement strategies reflect this. Consequently, we believe that incorporating hip-hop into math instruction for under-represented minorities holds great promise for success. The elements of rhyme, rhythm, and repetition make rap -- hip-hop's linguistic component -- an excellent creative vehicle for presenting concepts that require memorization. Math, in particular, lends itself to rap because the creative use of natural language provides a platform for transferring the conceptualization of math into real life experiences through story-telling.

By combining the learning experience with an activity that is already an integral part of a person's life, we believe that we will not only increase interest in learning, but will also maximize information retention. This coupled with the incorporation of multimedia elements that will be widely accessible (on public display for peers and or publish for a general audience) will motivate the individual (or group) to do their very best.

An innovative aspect of the proposed approach is that it combines teaching students at the elementary school level with multimedia content created by students at the high school level. This accomplishes two goals, it makes it easier to motivate the younger students, and at the same time, provides a great vehicle for exposing the older students to multimedia.

Back to Schedule

Amir Said

The Need for Better Models for Coding Sparse Multimedia Representations

A main objective in multimedia signal processing is to numerically eliminate redundancy and create sparse representations. However, for compression an effective representation needs to be effectively entropy coded, and we find a need to have good combined models for both the signal and how its information is distributed, in the sense of what and where the most important components are. Simple recursive set-partitioning methods were shown to be very effective to code sparse data, both in terms of compression and computational complexity, but their use still had not be extended to more complicated media types. In this talk we discuss the challenges and possibilities for improving performance using more sophisticated data models.

Back to Schedule

Russ Mersereau

"Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources"

It is well known that vectors derived from consecutive segments of most real-world signals are strongly correlated. This intervector correlation is not exploited in a standard VQ system. Many techniques proposed to exploit this correlation render the VQ suboptimal or require buffering and thus introduce encoding delay. This talk will present two alternative methods. The first approach, cache VQ, uses a cache memory to reduce the bit rate and the encoding time, at the cost of a slight, but controllable, increase in the coding error. The second approach, recently developed by Krishnan, Barnwell, and Anderson at Georgia Tech, overcomes cache VQ's limitations. Their approach, called dynamic codebook reordering, dramatically reduces the entropy in the representation of the VQ symbols, which can then be exploited for lossless compression. Dynamic codebook reordering can significantly reduce the bit rate for strongly correlated sources without introducing any additional distortion, coding delay, or sub-optimality when compared to a standard VQ system. Examples illustrating the efficiencies of these two techniques will be presented for both speech and video signals.

Back to Schedule

Sheila Hemami

A Signal Processer's Approach to Modeling the Human Visual System, and Applications

Current image and video compression algorithms (e.g., JPEG-2000, H.264) provide very high efficiency compression and excellent quality at relatively high bit rates. These algorithms operate by treating images and video as traditional "signals," employing efficient transformations, correlation-based models, and entropy coding. Human visual system characteristics have been successfully applied to high-rate signal-based compression, where stimuli such as compression-induced distortions are below the visibility threshold; i.e., humans cannot see them. Operation of such signal-based compression algorithms at low rates, in which compression-induced distortions are clearly visible, has to date operated based on visual system rules-of-thumb and has produced moderate success for images, while little has been done for video.

In this talk, I will present our recent results on characterizing the human visual system in a manner which allows for immediate incorporation into imaging and video applications such as compression and quality measurement at not only high rates/low distortions but also at low rates/high distortions. Results will be presented in two distinct areas: vision-based results explain how humans perceive stimuli, while engineering-motivated results allow us to incorporate our characterizations into practical algorithms.

Back to Schedule

Li Deng

Computer Speech Recognition: Building Mathematical Models Mimicking the Human System

The main goal of computer speech recognition/understanding is to automatically convert naturally uttered human speech into its corresponding text (and then into its meaning). While amazing success, both technologically and commercially, has been achieved in the past by straightforward mathematical methods (e.g., hidden Markov modeling, maximum likelihood and discriminative learning, dynamic programming, etc.), solutions of the remaining problems leading to its ultimate success appear to require a deep understanding of human speech recognition mechanisms. This talk will analyze various human sub-systems, including linguistic-concept generator, motor-control, articulation, vocal tract acoustic propagation, ears, auditory pathways, and auditory cortex, working in synergy to accomplish the remarkable task of highly robust, low-error speech recognition/perception and understanding. How to abstract the essence of such human information processing power in building a computer system with similar (or better) performance? How can we build mathematical models to enable the development of advanced machine-learning algorithms and techniques that will run efficiently in a computer? Is it possible to explore and exploit some special power of the computing machines inherently lacking in the human system so as to achieve super-human speech recognition? These are some of the issues to be addressed in this talk.

Back to Schedule

Mari Ostendorf

Managing Spoken Documents

As storage costs drop and bandwidth increases, there has been a rapid growth of information available via the web or in online archives, raising problems of finding and interpreting collections of documents. Significant recent progress has been made in text retrieval, analysis, summarization and translation, but much of this work has focused on written language. Increasingly, speech and video signals are also available -- including TV and radio broadcasts, congressional records, oral histories, voicemail, call center recordings, etc. -- which can be thought of as ``spoken documents''. Because it takes longer to listen to audio than to read text, spoken documents are clearly a prime candidate for automatic indexing, information extraction, and other such technologies. In this talk, we overview speech processing technology that underlies spoken document management, including mathematical frameworks for both word and metadata recognition, and for integrating video and language cues. In addition, we discuss issues that arise in text processing when moving from written to spoken language and implications for statistical models of language.

Back to Schedule

George Tzanetakis

A personal history of Music Information Retrieval

Music Information Retrieval (MIR) is an emerging research area that explores how large digital collections of music can be effectively analyzed for searching and browsing. It combines ideas from many different fields including Signal Processing, Machine Learning, Music Cognition, and Human-Computer Interaction. In this talk, I will give an historic overview of MIR with specific emphasis on topics I have more personal experience such as: audio feature extraction, automatic musical genre classification, rhythm analysis, query-by-humming, and sensor-enhanced musical instruments. I will conclude the talk by making predictions about the future of MIR and how it will transform radically the way music is produced, distributed and consumed.

Back to Schedule

Jose Moura

Deterministic and Stochastic, Time and Space Signal Models: An Algebraic Approach

We are all familiar with (infinite) "time" signal processing: time shifts, filters and convolution, signals, Fourier and z-transforms, spectrum, fast algorithms. Images, of course, are not "time" but "space" objects. Also, they are "finite" objects, i.e., defined over a finite indexing set. What is the natural concept of space shift, of space filter and convolution, spectral analysis, or "z"-transform, as well as many other related concepts? To address these questions, we go beyond linear algebra to present an algebraic approach where time (signal) and space (image) processing are instantiations of the same mathematical structure. The basic building block is the signal model - a triplet (A, M, φ) of an algebra A of filters, a module M of signals, and a generalization of the z-transform as a bijective linear mapping φ from a vector space into the module of signals. The shift is naturally interpreted as a generator of the algebra of filters, boundary conditions connect finite with infinite indexing sets, the trigonometric transforms (e.g., DCTs) are appropriate Fourier transforms, and the C-transform is the z-transform. More than a mathematical curiosity, the algebraic approach provides the appropriate structure to extend signal and image processing beyond uniform to other grids (e.g., hexagonal or quincunx), or develop fast algorithms from a few basic principles, from which we can also derive new fast algorithms for existing and new transforms. Connections with other image models, in particular, with Gauss Markov fields, and pinned Markov diffusions will be discussed. This talk overviews recent work with Markus Pueschel on the algebraic theory of signal and image processing.

[1] Pueschel and Moura, "Algebraic Theory of Signal Processing," submitted.
[2] Pueschel and Moura, "The Algebraic Approach to the DCT and DST and their Fast Algorithms," SIAM Journal of Computing, 35: (5), 1280-1316, Mar 2003.

Back to Schedule

Panos Nasiopoulos and Kostas Planiotis

Digital Video for Mobile Devices

Mobile wireless technologies and digital video broadcast technologies are gradually converging within efforts from 3GPP and DVB 2.0 to complete this merging in the upcoming generations of mobile technologies. In order to support this convergence, existing video technologies have to be upgraded to ensure the reliability and quality of the delivered content. This calls for highly efficient video codecs in addition to reliable error resilience techniques that overcome the bandwidth constraints and highly error prone conditions of wireless networks.

A Unified Framework for the Consumer-Grade Image Pipeline

In this talk a new modeling and processing approach suitable for consumer-grade image processing will be presented.

Using vector modeling principles, nonlinear image operators and adaptive filtering concepts, single-sensor camera image processing problems are treated from a global viewpoint yielding new classes of processing solutions.

Some of the varied applications of the framework will be covered; namely spectral interpolation (demosaicking), spatial interpolation of the acquired (mosaic-like) single-sensor gray-scale images as well as demosaicked full-color images, demosaicked image post-processing and color image enhancement, camera image denoising and sharpening, camera image compression, spatio-temporal video demosaicking, and camera image indexing and rights management.

Results obtained using the framework will be provided. The list of the topics to be covered, while certainly not exhaustive, provides a good indication of the usefulness and often necessity of our framework in consumer grade image processing.

Open research problems and other potential applications of the framework will also be discussed.

Back to Schedule

Michele Effros

Information Representation for Network Systems

Back to Schedule

Shahram Shirani

Analytical modeling of matching pursuit

Matching pursuit is a greedy algorithm that decomposes a signal into a redundant dictionary of basis functions. It has recently found applications in many areas including image and video processing. In this talk an analytical model is proposed to model the operation of matching pursuit algorithm on uniformly distributed signals. The model expresses the relationship between the bit rate and matching pursuit coder parameters such as dictionary size, quantization step size, distortion and dimension of the signal. This relationship can be used to optimize the dictionary size and quantization step size for minimum bit rate. The model is verified through experimental results and the accuracy of our model is validated for different system parameters.

Back to Schedule

Thrassos Pappas

Mathematical and Perceptual Models for Image Segmentation

We consider the segmentation of images of natural scenes. One of the challenges of this problem is that the statistical characteristics of perceptually uniform regions are spatially-varying due to effects of lighting, perspective, scale changes, etc. A second challenge is the extraction of perceptually relevant information. First, we consider the problem of segmenting images of objects with smooth surfaces. The images are modeled as smooth spatially-varying functions with sharp discontinuities at the segment boundaries, plus white Gaussian noise. We discuss an adaptive clustering algorithm for segmentation. It a generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image. The spatial constraints are modeled through the use of Gibbs/Markov random fields, while the local intensity variations are accounted for in an iterative procedure involving averaging over a sliding window whose size decreases as the algorithm progresses. We also consider a hierarchical implementation that results in better performance and computational efficiency. We then discuss an adaptive perceptual color-texture segmentation algorithm that is based on low-level features for color and texture. It combines knowledge of human perception with an understanding of signal characteristics in order to segment natural scenes into perceptually/semantically uniform regions. It is based on two types of spatially adaptive low-level features. The first describes the local color composition in terms of spatially adaptive dominant colors, and the second describes the spatial characteristics of the grayscale component of the texture. Key segmentation parameters are determined on the basis of subjective tests. The resulting segmentations convey semantic information that can be used for content-based retrieval.

Back to Schedule

Ling Guan

A New Framework for Modeling and Recognizing Human Movement and Actions

Human-computer interaction (HCI) study is a key research area in many scientific disciplines. I will start the talk with an overview on concepts, history and recent development in HCI: face, speech, gesture, human emotion and human actions, with emphasis on emotion and action recognition. I will then focus on a fundamental, but under-investigated research area in HCI: modeling and recognizing human movement and actions. Inspired by the movement notation systems used in dance and the paradigm of the phonemes used in the continuous speech recognition, we developed a Continuous Human Movement Recognition (CHMR) framework. The framework is based a novel paradigm, the alphabet of dynemes ? the smallest contrastive dynamic units of human movement. A Differential Evolution-Monte Carlo particle filter is introduced, which has demonstrated highly effective and robust characteristics in tracking basic human movement skills. Using multiple hidden Markov models, the recognition process attempts to infer the human movement skill that could have produced the observed sequence of dynemes. Recent anthropometric data shows that the famous "average sized human" model in Leonardo da Vinci's drawing of the human figure is a fallacy, and that there is no one who is average in 10 dimensions. Incorporating the highly accurate biometric features into the CHMR framework, we have been able to demonstrate the effectiveness of the framework in biometrics, biomedical analysis, and recognition of human skills. We are convinced that this framework will potentially form the enabling technology for biometric authentication systems for a broad range of applications such as security/surveillance, biomedicine/physiotherapy, special effect in motion picture production, digital asset management, battlefield surveillance, coaching/training/judging in sports and performing arts, to name a few.

Back to Schedule

Jie Liang

Time Domain Lapped Transform and Its Applications

In this talk, the theory and applications of time domain lapped transform will be reviewed, including the design of fast transform, its application in wavelet-based image and video coding, and error resilient design for multiple description coding.

Back to Schedule

Alfred Hero

Dimension reduction for classification

There has been intense interest in analysis of massively complex data sets with thousands of dimensions. Dimension reduction methods are critical components of any analysis method due to the requirements of computation and noise reduction. In this talk we will present new variational methods of dimension reduction that explicitly target classification, anomaly detection, or other tasks.

Back to Schedule

Ghassan Hamarneh

Deformable Models for Image Analysis; From 'Snakes' to 'Organisms'

I will start by giving a short overview on image segmentation and registration. I will then focus on Deformable models ('Snakes' and others) for image segmentation and mention issues related to incorporating prior knowledge. I will then present our work on 'deformable organisms', an artificial-life framework for image analysis, incorporating high-level, intelligent, intuitive control of shape deformations. Various application examples will be presented throughout the talk.

Back to Schedule

Back to 05w5505 home page


© 2005 Banff International Research Station