Skip to content

Write buffer protocol#356

Open
vallsv wants to merge 4 commits intoThriftpy:masterfrom
vallsv:write-memoryview
Open

Write buffer protocol#356
vallsv wants to merge 4 commits intoThriftpy:masterfrom
vallsv:write-memoryview

Conversation

@vallsv
Copy link
Copy Markdown

@vallsv vallsv commented Apr 9, 2026

Hi,

This pull request features the write binary support for the Python buffer protocol.

As result, on top of the already supported str and bytes, the API now supports bytearray and memoryview. Everything use the same new Cython write_buffer function. Pure python writer was also implemented.

This can be used to reduce a memory copy while using numpy arrays.

I don't have checked the result with a complex benchmark, but in my use case (arrays of about 16MB), i can notice a time reduction of about 10% in the average (client and server on the same machine). For small messages, this does not change anything. I don't really know what could be a good benchmark.

Actually i use it such way:

struct NDArray {
    1: string dtype;
    2: list<i32> shape;
    3: binary buffer;
}
def create_ndarray_record(array: numpy.ndarray):
    msg = NDArray()
    msg.shape = array.shape
    msg.dtype = array.dtype.str

    # msg.buffer = array.tobytes()  # <-- previous code, this creates a memory copy

    array = numpy.ascontiguousarray(array)
    msg.buffer = array.data

    return msg

The pull request does not have any dependency to numpy.

Would you be interested by such contribution? Tell me the way i could improve it.

@aisk
Copy link
Copy Markdown
Member

aisk commented Apr 10, 2026

The idea sounds reasonable to me. However, we should just support buffer protocol instead of just support memoryview. As a result, memoryview / numpy.array / torch.tensor both implemented buffer protocol, they can be used in thriftpy2 directly.

@vallsv vallsv force-pushed the write-memoryview branch from b2bc2dd to eaba2fc Compare April 10, 2026 21:00
@vallsv vallsv changed the title Write memoryview to buffer Write buffer protocol Apr 10, 2026
@vallsv
Copy link
Copy Markdown
Author

vallsv commented Apr 10, 2026

@aisk thanks for your feedback. Here is a rework with the buffer protocol.

I have also added few tests to check if both binary/cybinary can serialize few objects exposing the buffer protocol.

Comment thread thriftpy2/protocol/binary.py Outdated
@vallsv vallsv force-pushed the write-memoryview branch from bc0d690 to 3612db8 Compare April 17, 2026 19:15
@vallsv
Copy link
Copy Markdown
Author

vallsv commented Apr 17, 2026

struct.pack is not supporting memorymap.

It is possible to pack it like that.

def pack_memoryview(mem):
    return struct.pack("!i", len(mem)) + mem

But I have choose to write it in 2 steps instead. I guess it's better.

@vallsv vallsv force-pushed the write-memoryview branch 2 times, most recently from 33975fc to 09308ac Compare April 17, 2026 19:33
@vallsv
Copy link
Copy Markdown
Author

vallsv commented Apr 17, 2026

@aisk here the problem is that the pure python binary module is using TCyMemoryBuffer.

That's in between 2 worlds.

I am thinking if it makes sens to propagate that at such place? Ill take a look anyway.

vallsv and others added 2 commits April 17, 2026 23:21
Co-authored-by: AN Long <aisk@users.noreply.github.com>
@vallsv vallsv force-pushed the write-memoryview branch from 09308ac to 50f11ef Compare April 17, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants