Skip to content

Type annotation requires UDFs to return a type instead of an array #1516

@progval

Description

@progval

Describe the bug
Here:

_R = TypeVar("_R", bound=pa.DataType)

we bind _R to be a subtype of pyarrow.DataType

And in these two places we require that functions return a value of type _R (ie. they should return a pyarrow datatype):

def __init__(
self,
name: str,
func: Callable[..., _R],
input_fields: list[pa.Field],
return_field: _R,
volatility: Volatility | str,
) -> None:

@overload
@staticmethod
def udf(
func: Callable[..., _R],
input_fields: Sequence[pa.DataType | pa.Field] | pa.DataType | pa.Field,
return_field: pa.DataType | pa.Field,
volatility: Volatility | str,
name: str | None = None,
) -> ScalarUDF: ...

which seems incorrect, as functions should return arrays that contain values of that type

To Reproduce

datafusion-python ((53.0.0)) $ mypy examples/python-udf.py
examples/python-udf.py:27: error: Value of type variable "_R" of function cannot be "Array[Any]"  [type-var]
Found 1 error in 1 file (checked 1 source file)

Expected behavior
mypy should pass

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions