Afleveringen

  • Talking Papers Podcast Episode: "Cameras as Rays: Pose Estimation via Ray Diffusion" with Jason Zhang


    Welcome to the latest episode of the Talking Papers Podcast! This week's guest is Jason Zhang, a PhD student at the Robotics Institute at Carnegie Mellon University who joined us to discuss his paper, "Cameras as Rays: Pose Estimation via Ray Diffusion". The paper was published in the highly-respected conference ICLR, 2024.

    Jason's research hones in on the pivotal task of estimating camera poses for 3D reconstruction - a challenge made more complex with sparse views. His paper proposes an inventive and out-of-the-box representation that perceives camera poses as a bundle of rays. This innovative perspective makes a substantial impact on the issue at hand, demonstrating promising results even in the context of sparse views.

    What's particularly exciting is that his work, be it regression-based or diffusion-based, showcases top-notch performance on camera pose estimation on CO3D, and effectively generalizes to unseen object categories as well as captures in the wild.

    Throughout our conversation, Jason explained his insightful approach and how the denoising diffusion model and set-level transformers come into play to yield these impressive results. I found his technique a breath of fresh air in the field of camera pose estimation, notably in the formulation of both regression and diffusion models.

    On a more personal note, Jason and I didn't know each other before this podcast, so it was fantastic learning about his journey from the Bay Area to Pittsburgh. His experiences truly enriched our discussion and coined one of our most memorable episodes yet.

    We hope you find this podcast as enlightening as we did creating it. If you enjoyed our chat, don't forget to subscribe for more thought-provoking discussions with early career academics and PhD students. Leave a comment below sharing your thoughts on Jason's paper!

    Until next time, keep following your curiosity and questioning the status quo.

    #TalkingPapersPodcast #ICLR2024 #CameraPoseEstimation #3DReconstruction #RayDiffusion #PhDResearchers #AcademicResearch #CarnegieMellonUniversity #BayArea #Pittsburgh

    All links and resources are available in the blogpost: https://www.itzikbs.com/cameras-as-rays

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Welcome to another exciting episode of the Talking Papers Podcast! In this episode, I had the pleasure of hosting Jiahao Li, a talented PhD student at Toyota Technological Institute at Chicago (TTIC), who discussed his groundbreaking research paper titled "Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model". This paper, published in ICLR 2024, introduces a novel method that revolutionizes text-to-3D generation.

    Instant3D addresses the limitations of existing methods by combining a two-stage approach. First, a fine-tuned 2D text-to-image diffusion model generates a set of four structured and consistent views from the given text prompt. Then, a transformer-based sparse-view reconstructor directly regresses the NeRF from the generated images. The results are stunning: high-quality and diverse 3D assets are produced within a mere 20 seconds, making it a hundred times faster than previous optimization-based methods.

    As a 3D enthusiast myself, I found the outcomes of Instant3D truly captivating, especially considering the short amount of time it takes to generate them. While it's unusual for a 3D person like me to experience these creations through a 2D projection, the astonishing results make it impossible to ignore the potential of this approach. This paper underscores the importance of obtaining more and better 3D data, paving the way for exciting advancements in the field.

    Let me share a little anecdote about our guest, Jiahao Li. We were initially introduced through Yicong Hong, another brilliant guest on our podcast. Yicong, who was a PhD student at ANU during my postdoc, and Jiahao interned together at Adobe while working on this very paper. Coincidentally, Yicong also happens to be a coauthor of Instant3D. It's incredible to see such brilliant minds coming together on groundbreaking research projects.

    Now, unfortunately, the model developed in this paper is not publicly available. However, given the computational resources required to train these advanced models and obvious copyright issues, it's understandable that Adobe has chosen to keep it proprietary. Not all of us have a hundred GPUs lying around, right?

    Remember to hit that subscribe button and join the conversation in the comments section. Let's delve into the exciting world of Instant3D with Jiahao Li on this episode of Talking Papers Podcast!

    #TalkingPapersPodcast #ICLR2024 #Instant3D #TextTo3D #ResearchPapers #PhDStudents #AcademicResearch

    All links and resources are available in the blogpost: https://www.itzikbs.com/instant3d

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Zijn er afleveringen die ontbreken?

    Klik hier om de feed te vernieuwen.

  • In this exciting episode of #TalkingPapersPodcast, we have the pleasure of hosting Ana Dodik, a second-year PhD student at MIT. We delve into her research paper titled "Variational Barycentric Coordinates." Published in SIGGRAPH Asia, 2023, this paper significantly contributes to our understanding of the optimization of generalized barycentric coordinates.

    The paper introduces a robust variational technique that offers further control as opposed to existing models. Traditional practices are restrictive due to the representation of barycentric coordinates utilizing meshes or closed-form formulae. However, Dodik's research defies these limits by directly parameterizing the continuous function that maps any coordinate concerning a polytope's interior to its barycentric coordinates using a neural field. A profound theoretical characterization of barycentric coordinates is indeed the backbone of this innovation. This research demonstrates the versatility of the model by deploying variety of objective functions and also suggests a practical acceleration strategy.

    My take on this is rather profound: this tool can be very useful for artists. It sparks a thrill of anticipation of their feedback on its performance. Melding classical geometry processing methods with newer, Neural-X methods, this research stands as a testament to the significant advances in today's technology landscape.

    My talk with Ana was delightfully enriching. In a unique online setting, we discussed how the current times serve as the perfect opportunity to pursue a PhD. We owe that to improvements in technology.

    Remember to hit the subscribe button and leave a comment about your thoughts on Ana's research. We'd love to hear your insights and engage in discussions to further this fascinating discourse in academia.

    All links and resources are available in the blogpost: https://www.itzikbs.com/variational-barycentric-coordinates

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Welcome to another exciting episode of the Talking Papers Podcast! In this episode, we delve into the fascinating world of self-supervised learning with our special guest, Ravid Shwartz-Ziv. Together, we explore and dissect their research paper titled "Reverse Engineering Self-Supervised Learning," published in NeurIPS 2023.

    Self-supervised learning (SSL) has emerged as a game-changing technique in the field of machine learning. However, understanding the learned representations and their underlying mechanisms has remained a challenge - until now. Ravid Shwartz-Ziv's paper provides an in-depth empirical analysis of SSL-trained representations, encompassing various models, architectures, and hyperparameters.

    The study uncovers a captivating aspect of the SSL training process - its inherent ability to facilitate the clustering of samples based on semantic labels. Surprisingly, this clustering is driven by the regularization term in the SSL objective. Not only does this process enhance downstream classification performance, but it also exhibits a remarkable power of data compression. The paper further establishes that SSL-trained representations align more closely with semantic classes than random classes, even across different hierarchical levels. What's more, this alignment strengthens during training and as we venture deeper into the network.

    Join us as we discuss the insights gained from this exceptional research. One remarkable aspect of the paper is its departure from the trend of focusing solely on outperforming competitors. Instead, it dives deep into understanding the semantic clustering effect of SSL techniques, shedding light on the underlying capabilities of the tools we commonly use. It is truly a genre of research that holds immense value.

    During our conversation, Ravid Shwartz-Ziv - a CDS Faculty Fellow at NYU Center for Data Science - shares their perspectives and insights, providing an enriching layer to our exploration. Interestingly, despite both of us being in Israel at the time of recording, we had never met in person, highlighting the interconnectedness and collaborative nature of the academic world.

    Don't miss this thought-provoking episode that promises to expand your understanding of self-supervised learning and its impact on representation learning mechanisms. Subscribe to our channel now, join the discussion, and let us know your thoughts in the comments below!



    All links and resources are available in the blogpost: https://www.itzikbs.com/revenge_ssl

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting the brilliant Zoë Marschner as we delved into the fascinating world of Constructive Solid Geometry on Neural Signed Distance Fields. This exceptional research paper, published in SIGGRAPH Asia 2023, explores the cutting-edge potential of neural networks in shaping geometric representations.

    In our conversation, Zoë enlightened us on the challenges surrounding the editing of shapes encoded by neural Signed Distance Fields (SDFs). While common geometric operators seem like a promising solution, they often result in incorrect outputs known as Pseudo-SDFs, rendering them unusable for downstream tasks. However, fear not! Zoë and her team have galvanized this field with groundbreaking insights.

    They characterize the space of Pseudo-SDFs and proffer a novel regularizer called the closest point loss. This ingenious technique encourages the output to be an exact SDF, ensuring accurate shape representation. Their findings have profound implications for operations like CSG (Constructive Solid Geometry) and swept volumes, revolutionizing their applications in fields such as computer-aided design (CAD).

    As a former mechanical engineer, I find the concept of combining CSGs with Neural Signed Distance fields to be immensely empowering. The potential for creating intricate and precise designs is mind-boggling!

    On a personal note, I couldn't be more thrilled about this episode. Not only were two of the co-authors, Derek and Silvia, previous guests on the podcast, but I also had the pleasure of virtually meeting Zoë for the first time. Recording this episode with her was an absolute blast, and I must say, her enthusiasm and expertise shine through, despite being in the early stages of her career. It's worth mentioning that she has even collaborated with some of the most senior figures in the field!

    Join us on this captivating journey into the world of Neural Signed Distance Fields. Don't forget to subscribe and leave your thoughts in the comments section below. We would love to hear your take on this groundbreaking research!

    All links and resources are available in the blogpost: https://www.itzikbs.com/CSG_on_NSDF

    #TalkingPapersPodcast #SIGGRAPHAsia2023 #SDFs #CSG #shapeediting #neuralnetworks #CAD #research

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • 🎙️Join us on this exciting episode of the Talking Papers Podcast as we sit down with the talented Sadegh Aliakbarian to explore his groundbreaking ICCV 2023 paper "HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations" . Our guest, will take us on a journey through this pivotal research that addresses a crucial aspect of immersive mixed reality experiences.

    🌟 The quality of these experiences hinges on generating plausible and precise full-body avatar motion, a challenge given the limited input signals provided by Head-Mounted Devices (HMDs), typically head and hands 6-DoF. While recent approaches have made strides in generating full-body motion from such inputs, they assume full hand visibility. This assumption, however, doesn't hold in scenarios without motion controllers, relying instead on egocentric hand tracking, which can lead to partial hand visibility due to the HMD's field of view.

    🧠 "HMD-NeMo" presents a groundbreaking solution, offering a unified approach to generating realistic full-body motion even when hands are only partially visible. This lightweight neural network operates in real-time, incorporating a spatio-temporal encoder with adaptable mask tokens, ensuring plausible motion in the absence of complete hand observations.


    👤 Sadegh is currently a senior research scientist at Microsoft Mixed Reality and AI Lab-Cambridge (UK), where he's at the forefront of Microsoft Mesh and avatar motion generation. He holds a PhD from the Australian National University, where he specialized in generative modeling of human motion. His research journey includes internships at Amazon AI, Five AI, and Qualcomm AI Research, focusing on generative models, representation learning, and adversarial examples.

    🤝 We first crossed paths during our time at the Australian Centre for Robotic Vision (ACRV), where Sadegh was pursuing his PhD, and I was embarking on my postdoctoral journey. During this time, I had the privilege of collaborating with another co-author of the paper, Fatemeh Saleh, who also happens to be Sadegh's life partner. It's been incredible to witness their continued growth.

    🚀 Join us as we uncover the critical advancements brought by "HMD-NeMo" and their implications for the future of mixed reality experiences. Stay tuned for the episode release!

    All links and resources are available in the blogpost: https://www.itzikbs.com/hmdnemo

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Join us on this exciting episode of the Talking Papers Podcast as we sit down with the brilliant Jeong Joon Park to explore his groundbreaking paper, "CC3D: Layout-Conditioned Generation of Compositional 3D Scenes," just published at ICCV 2023.

    Discover CC3D, a game-changing conditional generative model redefining 3D scene synthesis. Unlike traditional 3D GANs, CC3D boldly crafts complex scenes with multiple objects, guided by 2D semantic layouts. With a novel 3D field representation, CC3D delivers efficiency and superior scene quality. Get ready for a deep dive into the future of 3D scene generation.

    My journey with Jeong Joon Park began with his influential SDF paper at CVPR 2019. We met in person at CVPR 2022, thanks to mutual guest Despoina, who was also a guest on our podcast. Now, as Assistant Professor at the University of Michigan CSE, JJ leads research in realistic 3D content generation, offering opportunities for students to contribute to the frontiers of computer vision and AI.

    Don't miss this insightful exploration of this ICCV 2023 paper and the future of 3D scene synthesis.


    CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

    Authors
    Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

    Abstract
    In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.

    All links and resources are available on the blog post:
    https://www.itzikbs.com/cc3d

    Subscribe and stay tuned! 🚀🔍

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Chengfenfg Xu to discuss his paper "NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection" which was published at ICCV2023.

    In recent times, NeRF has gained widespread prominence, and the field of 3D detection has encountered well-recognized challenges. The principal contribution of this study lies in its ability to address the detection task while simultaneously training a NeRF model and enabling it to generalize to previously unobserved scenes. Although the computer vision community has been actively addressing various tasks related to images and point clouds for an extended period, it is particularly invigorating to witness the application of NeRF representation in tackling this specific challenge.

    Chenfeng is currently a Ph.D. candidate at UC Berkeley, collaborating with Prof. Masayoshi Tomizuka and Prof. Kurt Keutzer. His affiliations include Berkeley DeepDrive (BDD) and Berkeley AI Research (BAIR), along with the MSC lab and PALLAS. His research endeavors revolve around enhancing computational and data efficiency in machine perception, with a primary focus on temporal-3D scenes and their downstream applications. He brings together traditionally separate approaches from geometric computing and deep learning to establish both theoretical frameworks and practical algorithms for temporal-3D representations. His work spans a wide range of applications, including autonomous driving, robotics, AR/VR, and consistently demonstrates remarkable efficiency through extensive experimentation. I am eagerly looking forward to see his upcoming research papers.

    PAPER
    NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

    AUTHORS
    Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

    ABSTRACT
    NeRF-Det is a novel method for 3D detection with posed RGB images as input. Our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. We subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without per-scene optimization.


    All links and resources are available on the blog post:
    https://www.itzikbs.com/nerf-det

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Tomas Jakab to discuss his paper "MagicPony: Learning Articulated 3D Animals in the Wild" which was published at CVPR 2023.

    The motivation behind the MagicPony methodology stems from the challenge posed by the scarcity of labeled data, particularly when dealing with real-world scenarios involving freely moving articulated 3D animals. In response, the authors propose an innovative solution that addresses this issue. This novel approach takes an ordinary RGB image as input and produces a sophisticated 3D model with detailed shape, texture, and lighting characteristics. The method's uniqueness lies in its ability to learn from diverse images captured in natural settings, effectively deciphering the inherent differences between them. This enables the system to establish a foundational average shape while accounting for specific deformations that vary from instance to instance. To achieve this, the researchers blend the strengths of two techniques, radiance fields and meshes, which together contribute to the comprehensive representation of the object's attributes. Additionally, the method employs a strategic viewpoint sampling technique to enhance computational speed. While the current results may not be suitable for practical applications just yet, this endeavor constitutes a substantial advancement in the field, as demonstrated by the tangible improvements showcased both quantitatively and qualitatively.


    AUTHORS
    Shangzhe Wu*, Ruining Li*, Tomas Jakab*, Christian Rupprecht, Andrea Vedaldi

    ABSTRACT
    We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.

    RELATED PAPERS
    📚CMR
    📚Deep Marching Tetrahedra
    📚DINO-ViT

    LINKS AND RESOURCES

    📚 Paper
    💻 Project page
    💻 Code


    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]


    All links are available in the blog post: https://www.itzikbs.com/magicpony

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • All links are available in this blog post

    Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Shir Iluz to discuss her groundbreaking paper titled "Word-As-Image for Semantic Typography" which won the SIGGRAPH 2023 Honorable Mention award.

    This scientific paper introduces an innovative approach for text morphing based on semantic context. Using bezier curves with control points, a rasterizer, and a vector diffusion model, the authors transform words like "bunny" into captivating bunny-shaped letters. Their optimization-based method accurately conveys the word's meaning. They address the readability-semantic balance with multiple loss functions, serving as "control knobs" for users to fine-tune results. The paper's compelling results are showcased in an impressive demo. Don't miss it!

    Their work carries immense potential, promising to revolutionize the creative processes of artists and designers. Rather than commencing from a traditional blank canvas or plain font, this innovative approach enables individuals to initiate their logo design journey by transforming a word into a captivating image. The implications of this novel technique hold the power to reshape the very workflow of artistic expression, opening up exciting new possibilities for visual communication and design aesthetics.

    I am eagerly anticipating the next set of papers she will sketch out (pun intended).

    AUTHORS
    Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel Shamir

    ABSTRACT
    A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques.

    RELATED PAPERS
    📚VectorFusion

    LINKS AND RESOURCES
    📚 Paper
    💻 Project page
    💻 Code
    💻 Demo

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]


    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Yawar Siddiqui to chat about his CVPR 2023 paper "Panoptic Lifting for 3D Scene Understanding with Neural Fields".

    All links are available in the blog post.

    In this paper, they proposed a new method for "lifting" 2D panoptic segmentation into a 3D volume represented as neural fields using in-the-wild scene images. While the semantic segmentation part is simply represented as an MLP, the instance indices are very difficult to keep track of in between the different frames. This is solved using a Hungarian algorithm and a set of custom losses.

    Yawar is currently a PhD student at the Technical University of Munich (TUM) under the supervision of Prof. Matthias Niessner. This work was done as part of his latest internship with Meta Zurich. It was a pleasure chatting with him and I can't wait to see what he cooks up next.

    AUTHORS
    Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Norman Müller, Matthias Nießner, Angela Dai, Peter Kontschieder

    ABSTRACT
    We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.

    SPONSOR
    This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
    Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.

    Visit YOOM

    For job opportunities with YOOM visit https://www.yoom.com/careers/

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]

    This episode was recorded on July 6th, 2023.

    #talkingpapers #CVPR2023 #PanopticLifting #NeRF #TensoRF #AI #Segmentation #DeepLearning #MachineLearning #research #artificialintelligence #podcasts #MachineLearning #research #artificialintelligence #podcasts

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Kejie Li to chat about his CVPR 2023 paper "MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices".

    All links are available in the blog post.

    In this paper, they proposed a new dataset and paradigm for evaluating 3D object reconstruction. It is very difficult to create a digital twin of 3D objects, even with expensive sensors. They introduce a new RGBD dataset, captured from a mobile device. The nice trick to obtaining the ground truth is that they used LEGO bricks that have an exact CAD model.

    Kejie is currently a research scientist at ByteDance/ TikTok. When writing this paper he was a postdoc at Oxford. Prior to this, he successfully obtained his PhD from the University of Adelaide. Although we hadn't crossed paths until this episode, we both have some common ground in our CVs, having been affiliated with different nodes of the ACRV (Adelaide for him and ANU for me). I'm excited to see what he comes up with next, and eagerly await his future endeavours.

    AUTHORS
    Kejie Li, Jia-Wang Bian, Robert Castle, Philip H.S. Torr, Victor Adrian Prisacariu

    ABSTRACT
    High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. However, it is difficult to create a replica of an object in reality, and even 3D reconstructions generated by 3D scanners have artefacts that cause biases in evaluation. To address this issue, we introduce a novel multi-view RGBD dataset captured using a mobile device, which includes highly precise 3D ground-truth annotations for 153 object models featuring a diverse set of 3D structures. We obtain precise 3D ground-truth shape without relying on high-end 3D scanners by utilising LEGO models with known geometry as the 3D structures for image capture. The distinct data modality offered by high-resolution RGB images and low-resolution depth maps captured on a mobile device, when combined with precise 3D geometry annotations, presents a unique opportunity for future research on high-fidelity 3D reconstruction. Furthermore, we evaluate a range of 3D reconstruction algorithms on the proposed dataset.

    RELATED PAPERS
    📚COLMAP
    📚NeRF
    📚NeuS
    📚CO3D

    LINKS AND RESOURCES
    📚 Paper
    💻Project page
    💻Code

    SPONSOR
    This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
    Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.

    Visit YOOM

    For job opportunities with YOOM visit https://www.yoom.com/careers/


    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]

    This episode was recorded on May 8th, 2023.

    #talkingpapers #CVPR2023 #NeRF #Dataset #mobilebrick #ComputerVision #AI #NeuS #DeepLearning #MachineLearning #research #artificialintelligence #podcasts

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • All links are available in the blog post.

    In this episode of the Talking Papers Podcast, I hosted Jiahao Zhang to chat about our CVPR 2023 paper "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations".

    furniture assembly diagram. To do that, we collected and annotated a brand new dataset: "IKEA Assembly in the Wild" where we aligned YouTube videos with IKEA's instruction manuals. Our approach to addressing this task proposes several supervised contrastive losses that contrast between video and diagram, video and manual, and internal manual images.

    Jiahao is currently a PhD student at the Australian National University. His research focus is on human action recognition and multi-modal representation alignment. We first met (virtually) when Jiahao did his Honours project, where he developed an amazing (and super useful) video annotation tool ViDaT. His strong software engineering and web development background gives him a strong advantage when working on his research projects. Even though we never met in person (yet), we are actively collaborating and I already know what he is cooking up next. I hope to share it with the world soon.

    AUTHORS
    Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould

    RELATED PAPERS
    📚IKEA ASM Dataset
    📚CLIP
    📚SlowFast

    LINKS AND RESOURCES
    📚 Paper
    💻Project page
    💻Dataset page
    💻Code

    SPONSOR
    This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
    Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.

    Visit YOOM

    For job opportunities with YOOM visit https://www.yoom.com/careers/

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]

    This episode was recorded on May 1st, 2023.

    #talkingpapers #CVPR2023 #IAWDataset #ComputerVision #AI #ActionRecognition #DeepLearning #MachineLearning #research #artificialintelligence #podcasts

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • All links are available in the blog post: https://www.itzikbs.com/inr2vec/

    In this episode of the Talking Papers Podcast, I hosted Luca De Luigi. We had a great chat about his paper “Deep Learning on Implicit Neural Representations of Shapes”, AKA INR2Vec, published in ICLR 2023 .

    In this paper, they take implicit neural representations to the next level and use them as input signals for neural networks to solve multiple downstream tasks. The core idea was captured by one of the authors in a very catchy and concise tweet: "Signals are networks so networks are data and so networks can process other networks to understand and generate signals".

    Luca recently received his PhD from the University of Bolognia and is currently working at a startup based in Bolognia eyecan.ai. His research focus is on neural representations of signals, especially for 3D geometry. To be honest, I knew I wanted to get Luca on the podcast the second I saw the paper on arXiv because I was working on a related topic but had to shelf it due to time management issues. This paper got me excited about that topic again. I didn't know Luca before recording the episode and it was a delight to get to know him and his work.


    AUTHORS
    Luca De Luigi, Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di Stefano

    ABSTRACT
    pes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that INRs consist in neural networks, it is not clear whether and how it may be possible to feed them into deep learning pipelines aimed at solving a downstream task. In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. We verify that inr2vec can embed effectively the 3D shapes represented by the input INRs and show how the produced embeddings can be fed into deep learning pipelines to solve several tasks by processing exclusively INRs.

    RELATED PAPERS
    📚SIREN
    📚DeepSDF
    📚PointNet

    LINKS AND RESOURCES
    📚 Paper
    💻Project page


    SPONSOR
    This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
    Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.

    Visit https://www.yoom.com/

    For job opportunities with YOOM visit https://www.yoom.com/careers/

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]


    This episode was recorded on March 22, 2023.


    #talkingpapers #ICLR2023 #INR2Vec #ComputerVision #AI #DeepLearning #MachineLearning #INR #ImplicitNeuralRepresentation #research #artificialintelligence #podcasts

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Yael Vinker. We had a great chat about her paper "CLIPasso: SEmantically-Aware Object Sketching”, SIGGRAPH 2022 best paper award winner.

    In this paper, they convert images into sketches with different levels of abstraction. They avoid the need for sketch datasets by using the well-known CLIP model to distil the semantic concepts from sketches and images. There is no network training here, just optimizing the control points of Bezier curves to model the sketch strokes (initialized by a saliency map). How is this differentiable? They use a differentiable rasterizer. The degree of abstraction is controlled by the number of strokes. Don't miss the amazing demo they created.

    Yael is currently a PhD student at Tel Aviv University. Her research focus is on computer vision, machine learning, and computer graphics with a unique twist of combining art and technology. This work was done as part of her internship at EPFL

    AUTHORS

    Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir

    ABSTRACT

    Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distil semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.

    RELATED PAPERS

    📚CLIP: Connecting Text and Images

    📚Differentiable Vector Graphics Rasterization for Editing and Learning

    LINKS AND RESOURCES

    📚 Paper

    💻Project page


    SPONSOR

    This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.
    Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.


    Visit YOOM.com.

    CONTACT

    If you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email: [email protected]


    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • All links are available in the blog post.

    In this episode of the Talking Papers Podcast, we hosted Amir Belder. We had a great chat about his paper "Random Walks for Adversarial Meshes”, published in SIGGRAPH 2022.

    In this paper, they take on the task of creating an adversarial attack for triangle meshes. This is a non-trivial task since meshes are irregular. To solve the irregularity they use random walks instead of the raw mesh. On top of that, they trained an imitating network that mimics the predictions of the attacked network and used the gradients to perturb the input points.

    Amir is currently a PhD student at the Computer Graphics and Multimedia Lab at the Technion Israel Institute of Technology. His research focus is on computer graphics and geometric processing and machine learning. We spend a lot of time together at the lab and chat often about science, papers and where the field is headed. Having this paper published was a great opportunity to share one of these conversations with you.

    AUTHORS
    Amir Belder, Gal Yefet, Ran Ben-Itzhak, Ayellet Tal

    ABSTRACT
    have recently emerged as a useful representation for 3D shapes. These fields are We A polygonal mesh is the most-commonly used representation of surfaces in computer graphics. Therefore, it is not surprising that a number of mesh classification networks have recently been proposed. However, while adversarial attacks are wildly researched in 2D, the field of adversarial meshes is under explored. This paper proposes a novel, unified, and general adversarial attack, which leads to misclassification of several state-of-the-art mesh classification neural networks. Our attack approach is black-box, i.e. it has access only to the network’s predictions, but not to the network’s full architecture or gradients. The key idea is to train a network to imitate a given classification network. This is done by utilizing random walks along the mesh surface, which gather geometric information. These walks provide insight onto the regions of the mesh that are important for the correct prediction of the given classification network. These mesh regions are then modified more than other regions in order to attack the network in a manner that is barely visible to the naked eye.

    RELATED PAPERS
    📚Explaining and Harnessing Adversarial Examples
    📚Meshwalker: Deep mesh understanding by random walks

    LINKS AND RESOURCES
    📚 Paper
    💻Code

    To stay up to date with Amir's latest research, follow him on:
    🐦Twitter
    👨🏻‍🎓Google Scholar
    👨🏻‍🎓LinkedIn


    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]

    This episode was recorded on November 23rd 2022.

    #talkingpapers #SIGGRAPH2022 #RandomWalks #MeshWalker #AdversarialAttacks #Mesh #ComputerVision #AI #DeepLearning #MachineLearning #ComputerGraphics #research #artificialintelligence #podcasts

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Silvia Sellán. We had a great chat about her paper "Stochastic Poisson Surface Reconstruction”, published in SIGGRAPH Asia 2022.

    In this paper, they take on the task of surface reconstruction with a probabilistic twist. They take the well-known Poisson Surface reconstruction algorithm and generalize it to give it a full statistical formalism. Essentially their method quantifies the uncertainty of surface reconstruction from a point cloud. Instead of outputting an implicit function, they represent the shape as a modified Gaussian process. This unique perspective and interpretation enables conducting statistical queries, for example, given a point, is it on the surface? is it inside the shape?

    Silvia is currently a PhD student at the University of Toronto. Her research focus is on computer graphics and geometric processing. She is a Vanier Doctoral Scholar, an Adobe Research Fellow and the winner of the 2021 UoFT FAS Deans Doctoral excellence scholarship. I have been following Silvia's work for a while and since I have some work on surface reconstruction when SPSR came out, I knew I wanted to host her on the podcast (and gladly she agreed). Silvia is currently looking for postdoc and faculty positions to start in the fall of 2024. I am really looking forward to seeing which institute snatches her.

    In our conversation, I particularly liked her explanation of Gaussian Processes with the example "How long does it take my supervisor to answer an email as a function of the time of day the email was sent", You can't read that in any book. But also, we took an unexpected pause from the usual episode structure to discuss the question of "papers" as a medium for disseminating research. Don't miss it.


    AUTHORS
    Silvia Sellán, Alec Jacobson

    ABSTRACT
    shapes from 3D point clouds. Instead of outputting an implicit function, we represent the reconstructed shape as a modified Gaussian Process, which allows us to conduct statistical queries (e.g., the likelihood of a point in space being on the surface or inside a solid). We show that this perspective: improves PSR's integration into the online scanning process, broadens its application realm, and opens the door to other lines of research such as applying task-specific priors.

    RELATED PAPERS
    📚Poisson Surface Reconstruction

    📚Geometric Priors for Gaussian Process Implicit Surfaces

    📚Gaussian processes for machine learning


    LINKS AND RESOURCES

    📚 Paper

    💻Project page


    To stay up to date with Silvia's latest research, follow him on:

    🐦Twitter

    👨🏻‍🎓Google Scholar

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Sameera Ranasinghe. We had a great chat about his paper "Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs”, published in ECCV 2022 as an oral presentation.

    In this paper, they propose a new family of activation functions for coordinate MLPs and provide a theoretical analysis of their effectiveness. Their main proposition is that the stable rank is a good measure and design tool for such activation functions. They show that their proposed activations outperform the traditional ReLU and Sine activations for image parametrization and novel view synthesis. They further show that while the proposed family of activations does not require positional encoding they can benefit from using it by reducing the number of layers significantly.

    Sameera is currently an applied scientist at Amazon and the CTO and co-founder of ConscientAI. His research focus is theoretical machine learning and computer vision. This work was done when he was a postdoc at the Australian Institute of Machine Learning (AIML). He completed his PhD at the Australian National University (ANU). We first met back in 2019 when I was a research fellow at ANU and he was still doing his PhD. I immediately noticed we share research interests and after a short period of time, I flagged him as a rising star in the field. It was a pleasure to chat with Sameera and I am looking forward to reading his future papers.

    AUTHORS

    Sameera Ramasinghe, Simon Lucey

    RELATED PAPERS

    📚NeRF

    📚SIREN

    📚"Fourier Features Let Networks Learn High-Frequency Functions in Low Dimensional Domains"

    📚On the Spectral Bias of Neural Networks


    LINKS AND RESOURCES

    📚 Paper

    💻Code

    To stay up to date with Marko's latest research, follow him on:

    🐦Twitter

    👨🏻‍🎓Google Scholar

    👨🏻‍🎓LinkedIn

    Recorded on November 14th 2022.

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted Marko Mihajlovic . We had a great chat about his paper "KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints”, published in ECCV 2022.

    In this paper, they create a generalizable NeRF for virtual avatars. To get a high-fidelity reconstruction of humans (from sparse observations), they leverage an off-the-shelf keypoint detector in order to condition the NeRF.

    Marko is a 2nd year PhD student at ETH, supervised by Siyu Tang. His research focuses on photorealistic reconstruction of static and dynamic scenes and also modeling of parametric human bodies. This work was done mainly during his internship at Meta Reality Labs. Marko and I met at CVPR 2022.

    AUTHORS
    Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, Shunsuke Saito

    ABSTRACT
    Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling.

    RELATED PAPERS
    📚NeRF
    📚IBRNet
    📚PIFu

    LINKS AND RESOURCES
    💻Project website
    📚 Paper
    💻Code
    🎥Video

    To stay up to date with Marko's latest research, follow him on:
    👨🏻‍🎓Personal Page
    🐦Twitter
    👨🏻‍🎓Google Scholar

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP

  • In this episode of the Talking Papers Podcast, I hosted David B. Lindell to chat about his paper "BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation”, published in CVPR 2022.

    In this paper, they took on training a coordinate network. They do this by introducing a new type of neural network architecture that has an analytical Fourier spectrum. This allows them to do things like multi-scale signal representation, and, it gives an interpretable architecture, with an explicitly controllable bandwidth.

    David recently completed his Postdoc at Stanford and will join the University of Toronto as an Assistant Professor. During our chat, I got to know a stellar academic with a unique view of the field and where it is going. We even got to meet in person at CVPR. I am looking forward to seeing what he comes up with next. It was a pleasure having him on the podcast.

    AUTHORS
    David B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon Wetzstein

    ABSTRACT
    Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has constrained behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without per-scale supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality.

    RELATED PAPERS
    📚SIREN

    📚Multiplicative Filter Networks (MFN)

    📚Mip-Nerf

    📚Followup work: Residual MFN


    LINKS AND RESOURCES
    💻Project website

    📚 Paper

    💻Code

    🎥Video


    To stay up to date with David's latest research, follow him on:
    👨🏻‍🎓Personal Page

    🐦Twitter

    👨🏻‍🎓Google Scholar

    👨🏻‍🎓LinkedIn


    Recorded on June 15th 2022.

    CONTACT

    If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: [email protected]

    SUBSCRI

    🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com

    📧Subscribe to our mailing list: http://eepurl.com/hRznqb

    🐦Follow us on Twitter: https://twitter.com/talking_papers

    🎥YouTube Channel: https://bit.ly/3eQOgwP