fix #30 by demon36 · Pull Request #254 · jnr/jnr-ffi

demon36 · 2021-07-06T08:46:54Z

fix #30, DirectMemoryIO.getString() fails for non UTF-8

DirectMemoryIO.getString() fails for non UTF-8

headius · 2021-07-07T22:58:07Z

Will review, thank you!

headius

Minor suggestions to improve the code... at least the formatting should be fixed, and any refinement you can make for the other comments would be great!

headius · 2021-07-08T16:49:01Z

    public String getString(long offset, int maxLength, Charset cs) {
-        final byte[] bytes = IO.getZeroTerminatedByteArray(address() + offset, maxLength);
-        return cs.decode(ByteBuffer.wrap(bytes)).toString();
+        if(cs == StandardCharsets.UTF_8){


Please reformat according to more common Java formatting standards:

if (cond) { ... } else { ... for (stuff) { ...

Otherwise we get a mix of formats and future patches will likely include unhelpful formatting updates.

headius · 2021-07-08T16:50:47Z

    public String getString(long offset, int maxLength, Charset cs) {
-        final byte[] bytes = IO.getZeroTerminatedByteArray(address() + offset, maxLength);
-        return cs.decode(ByteBuffer.wrap(bytes)).toString();
+        if(cs == StandardCharsets.UTF_8){


I think this logic could apply to any Charset that has a minimum character width of 1 byte, correct? All such encodings should use a single-byte \0 for C string termination I think?

At the very least this should include the ISO-8859 encodings, which are all single-byte (their CharsetEncoder.maxBytesPerChar will all be 1.0).

agree that I should include other single byte terminator charsets, but CharsetEncoder.maxBytesPerChar won't be sufficient because it does not resolve to 1.0 for utf-8

headius · 2021-07-08T17:02:05Z

+        }else{
+            byte[] bytes = new byte[maxLength];
+            IO.getByteArray(address() + offset, bytes, 0, maxLength);
+            final byte[] nullCharBytes = new String("\0").getBytes(cs);


This could possibly be cached for common encodings (UTF-16, UTF-32) or it might be valid to just use a width of \0 equal to the encoding's CharsetEncoder.maxBytesPerChar value.

…tly with more charsets, fix DirectMemoryIO.getString()

…ngth

headius · 2021-08-12T18:17:11Z

Request another review when you are ready. Thanks for keeping at it!

demon36 · 2021-08-12T18:35:18Z

this part of the code is quite tricky, will keep you posted, thanks

demon36 · 2021-08-12T18:37:26Z

btw is the irc channel still alive ?

…ngth() that works with wide chars, default max length to max int

headius · 2021-08-16T17:07:23Z

@demon36 Ah no I expect the IRC channel, being on now-defunct FreeNode, is probably dead. I should set up a new channel on Matrix or libera. For now if you want to chat with those of us maintaining these libraries, stop by the #jruby channel on Matrix!

demon36 · 2021-08-18T06:02:10Z

@headius good to know that, the PR is ready for a review, please also take a look at Struct.java

headius

This is looking near completion. Main issues are:

formatting throughout does not match Java conventions or the rest of the codebase
no tests provided for the behavior

Almost there!

headius · 2021-08-18T15:35:55Z

-        return cs.decode(ByteBuffer.wrap(bytes)).toString();
+    	long baseAddress = address() + offset;
+    	int nullTermSize = StringUtil.terminatorWidth(cs);
+        if(nullTermSize == 1) {


Formatting throughout should match the rest of the code, which is intended to match typical Java coding conventions:

four space indentation for blocks of code, 8-space indentation for line continuation

space between keywords like if and while and their parenthesized conditionals

spaces around operators like + and %

It's nitpicky but if we don't maintain consistent code formatting then we end up with future commits and PRs that have lots of unrelated changes.

agree, will take care of that

headius · 2021-08-18T15:37:18Z

        // Find the NUL terminator and limit to that, so the
        // StringBuffer/StringBuilder does not have superfluous NUL chars
-        int end = indexOf(buffer, (byte) 0);
+        final byte[] nullCharBytes = new byte[StringUtil.terminatorWidth(charset)];


Perhaps we can cache the three known lengths and not reallocate this byte[] every time?

headius · 2021-08-18T19:42:39Z

Looks like this might need a merge or rebase from master to pick up that missing import too.

headius · 2021-08-31T17:02:56Z

@demon36 Should we just close this in light of the work to do this all natively in jffi? I am inclined to keep this as fallback code when we do not have an updated jffi binary, but what are your thoughts going forward?

fix jnr#30

a88b8a6

DirectMemoryIO.getString() fails for non UTF-8

headius requested changes Jul 8, 2021

View reviewed changes

demon36 added 3 commits August 6, 2021 11:05

implement WStringRef (mapping for wchar_t*), make set() work efficien…

3fb7cec

…tly with more charsets, fix DirectMemoryIO.getString()

bug fix for null terminated strings of allocated size less than maxLe…

b4ab181

…ngth

utf-16/32 support for getCharSequence()

d1658b9

demon36 force-pushed the master branch from f4b89a7 to d1658b9 Compare August 12, 2021 17:31

demon36 added 2 commits August 13, 2021 14:05

revert unneeded length change, add WStringRef.reAllocate() & setMaxLe…

61ce834

…ngth() that works with wide chars, default max length to max int

possible fixes for DirectMemoryIO.getString()

9d1d562

demon36 added 2 commits August 17, 2021 16:47

more compliant spacing

b796974

make use of StringUtil.terminatorWidth() for better performance

badc99e

headius requested changes Aug 18, 2021

View reviewed changes

headius added this to the 2.2.5 milestone Aug 18, 2021

headius modified the milestones: 2.2.5, 2.2.6, 2.2.7 Sep 1, 2021

headius modified the milestones: 2.2.7, 2.2.8 Sep 16, 2021

headius modified the milestones: 2.2.8, 2.2.9 Oct 26, 2021

headius modified the milestones: 2.2.9, 2.2.10 Nov 22, 2021

headius modified the milestones: 2.2.10, 2.2.11 Dec 1, 2021

Conversation

demon36 commented Jul 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

headius commented Jul 7, 2021

Uh oh!

headius left a comment

Choose a reason for hiding this comment

Uh oh!

headius Jul 8, 2021

Choose a reason for hiding this comment

Uh oh!

headius Jul 8, 2021

Choose a reason for hiding this comment

Uh oh!

demon36 Aug 17, 2021

Choose a reason for hiding this comment

Uh oh!

headius Jul 8, 2021

Choose a reason for hiding this comment

Uh oh!

headius commented Aug 12, 2021

Uh oh!

demon36 commented Aug 12, 2021

Uh oh!

demon36 commented Aug 12, 2021

Uh oh!

headius commented Aug 16, 2021

Uh oh!

demon36 commented Aug 18, 2021

Uh oh!

headius left a comment

Choose a reason for hiding this comment

Uh oh!

headius Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

demon36 Aug 19, 2021

Choose a reason for hiding this comment

Uh oh!

headius Aug 18, 2021

Choose a reason for hiding this comment

Uh oh!

headius commented Aug 18, 2021

Uh oh!

headius commented Aug 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

demon36 commented Jul 6, 2021 •

edited

Loading