Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P PyAV
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 37
    • Issues 37
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 26
    • Merge requests 26
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • PyAV
  • PyAV
  • Issues
  • #590
Closed
Open
Issue created Dec 30, 2019 by Administrator@rootContributor

Corrupt File causes program to exit prematurely using TRY/EXCEPT block.

Created by: RoscoeTheDog

Overview

When performing decoding operations in a Try/Except block, PyAV encounters a corrupt file and then prematurely exits the entire application without catching the thrown exception. Normally, it would catch the exception and continue operating as normal.

Expected behavior

Throw exception from the current working file and continue operating as usual.

Actual behavior

Exception thrown:

Format flac detected only with low score of 1, misdetection possible!
Could not find codec parameters for stream 0 (Audio: flac, 0 channels): unspecified sample format
Consider increasing the value for the 'analyzeduration' and 'probesize' options

Reproduction

Pass the path of the unzipped file to the function. Run the source code. Consider adding some random print statements or something after the function call to see how it crashes the rest of the program prematurely.

#import av

def pyav_decode(path: str) -> dict:
    """
    :param path: :str: "path to file"
    :return: :dict: "dict of metadata from file"
    """

    # declare different types of streams (defaults to false)
    file_meta =\
        {
            'a_stream': False,
            'v_stream': False,
            'i_stream': False,
            'succeeded': False,
        }

    try:
        # TODO: investigate why open() can take awhile to return in some cases. (seems mostly video)
        container = av.open(path)

        """
            Note:
            
                Values that == 0 mean not known or is a False Positive.
        
                Decode audio channels first for efficiency (it has the highest probability to fail).
        """

        for frame in container.decode(audio=0):
            channels = frame.layout.channels  # returns (tupple[list]) of channels
            counter = 0
            for ch in channels:
                counter += 1
            if not counter == 0:
                file_meta['channels'] = counter
                file_meta['channel_layout'] = frame.layout.name
            break  # Do not decode all frames for audio channel info

        # decode file's bit-rate
        if not int(container.bit_rate / 1000) == 0:
            file_meta['a_bit_rate'] = container.bit_rate / 1000

        # decode file's streams
        for s in container.streams:

            """
                Certain properties from Images (such as stream type) can be mistaken as Video.
                Check the decoder's format name to determine if == image or video.
            """

            # IMAGE STREAMS
            file_meta['i_stream'] = is_image_stream(s.codec_context.format.name)

            if file_meta['i_stream'] is True:  # skip current working stream if == Image type
                continue

            # VIDEO STREAMS
            elif s.type == 'video':

                file_meta['v_stream'] = True

                """
                    - PyAV library does not always return v_duration reliably, but is the fastest method.
                    
                    - FFprobe is an alternative whenever v_duration is not returned.
                """

                file_meta['v_duration'] = s.metadata.get('DURATION', '')

                if file_meta['v_duration'] == '':
                    stdout, stderr = ffprobe(path)
                    file_meta = parse_ffprobe(stdout, stderr)
                    file_meta = validate_keys(file_meta)
                    break

                # decode video container's resolution
                if not s.width == 0:
                    file_meta['v_width'] = s.width
                if not s.height == 0:
                    file_meta['v_height'] = s.height

                # decode actual encoded resolution of video
                if not s.coded_width == 0:
                    file_meta['v_buffer_width'] = s.coded_width
                if not s.coded_height == 0:
                    file_meta['v_buffer_height'] = s.coded_height

                file_meta['nb_frames'] = s.frames

                if s.frames == 0:
                    file_meta['nb_frames'] = s.metadata.get('NUMBER_OF_FRAMES', '')

                # decode frame-rate (returned in fraction format)
                if not int(s.rate) == 0:
                    file_meta['v_frame_rate'] = float(s.rate)

                # decode video format
                if s.pix_fmt:
                    file_meta['v_pix_fmt'] = s.pix_fmt

            # AUDIO STREAMS
            elif s.type == 'audio':
                file_meta['a_stream'] = True

                # decode sample format
                if s.format.name:
                    file_meta['a_sample_fmt'] = s.format.name

                # decode sample rate
                if not int(s.sample_rate) == 0:
                    file_meta['a_sample_rate'] = s.sample_rate

                # decode bit depth (note: 24 bit will show as 32 -- check sample_fmt for pcm_s24le instead)
                if not int(s.format.bits) == 0:
                    file_meta['a_bit_depth'] = s.format.bits

        # check dict keys for missing entries or 0s -- minimize decoding false positives into database
        file_meta = validate_keys(file_meta)
        file_meta['succeeded'] = True

    except Exception as e:
        file_meta['succeeded'] = False
        print(e)

    return file_meta

def is_image_stream(stream_fmt: str):
    """
    Normally we just check mimetypes to check how the file will be decoded before the function call.
    This function just adds a precautionary step during an event where a video/image is being decoded and frames are misinterpreted.

    :param stream_fmt: accepts string value of av_decode stream format
    :return: boolean value of image type
    """

    _stream = False

    if 'pipe' in stream_fmt:  # 'pipe' are typically image-type decoders
        _stream = True

    if stream_fmt in ['image2', 'tty', 'ico', 'gif']:  # list of some other image decoders
        _stream = True

    return _stream

Versions

Windows 10 Pro x64 Python 3.7.4 PyAV version: 6.2.0 py37heb183d3_1 conda-forge FFmpeg 4.1.3 built with gcc 8.3.1 (GCC) 20190414

Additional context

The given source code is only to extract metadata and does not handle any encoding operations.

Under any other circumstance, I can use a Try/Except to catch any of these problems and keep other operations running smoothly. This file somehow exits the application prematurely and ignores all of this

The problem file has been attached to this post below. You can clearly see it is corrupt, having a file size of '0'. Identifying this is not the issue, it is managing/handling it within the source code so that it does not break everything which is the problem.

03 Tycho - Slack.zip

Assignee
Assign to
Time tracking