-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add blob.length property #225
Conversation
The whole |
I'm hesitant to modify the |
Modifying def __len__(self) -> int:
return hb_blob_get_length(self._hb_blob) code: def test_len(fontPath):
fontSize = os.path.getsize(fontPath)
print(f"fontSize: {fontSize}")
blob = uharfbuzz.Blob.from_file_path(fontPath)
len_blob = len(blob)
print(f"len(blob): {len_blob}")
face = uharfbuzz.Face(blob)
len_face_blob = len(face.blob)
print(f"len(face.blob): {len_face_blob}")
def test_datalen(fontPath):
fontSize = os.path.getsize(fontPath)
print(f"fontSize: {fontSize}")
blob = uharfbuzz.Blob.from_file_path(fontPath)
len_blob = len(blob.data)
print(f"len(blob.data): {len_blob}")
face = uharfbuzz.Face(blob)
len_face_blob = len(face.blob.data)
print(f"len(face.blob.data): {len_face_blob}") result: test_len:
fontSize: 16998496
len(blob): 16998496
len(face.blob): 16998496
test_datalen:
fontSize: 16998496
len(blob.data): 0
len(face.blob.data): 16998496 |
If |
It seems to be running well, although reading the data part inevitably involves accessing the hard drive, as the data is in bytes. def test_len(fontPath):
fontSize = os.path.getsize(fontPath)
print(f"fontSize: {fontSize}")
blob = uharfbuzz.Blob.from_file_path(fontPath)
print(f"len(blob): {len(blob)}")
face = uharfbuzz.Face(blob)
print(f"len(face.blob): {len(face.blob)}")
def test_datalen(fontPath):
fontSize = os.path.getsize(fontPath)
print(f"fontSize: {fontSize}")
blob = uharfbuzz.Blob.from_file_path(fontPath)
print(f"len(blob.data): {len(blob.data)}")
face = uharfbuzz.Face(blob)
print(f"len(face.blob.data): {len(face.blob.data)}")
def test_face_blob(fontPath):
fontSize = os.path.getsize(fontPath)
print(f"fontSize: {fontSize}")
with open(fontPath, "rb") as f:
fontBytes = f.read()
face = uharfbuzz.Face(fontBytes)
print(f"len(face.blob): {len(face.blob)}")
print(f"len(face.blob.data): {len(face.blob.data)}") result: test_len:
fontSize: 16998496
len(blob): 16998496
len(face.blob): 16998496
test_datalen:
fontSize: 16998496
len(blob.data): 16998496
len(face.blob.data): 16998496
test_face_blob:
fontSize: 16998496
len(face.blob): 16998496
len(face.blob.data): 16998496 |
If the blob is created from a file path, HarfBuzz will mmap the file, so reading the data will indeed involve file I/O. |
Yes, but now the build is failing, and it's strange because I can't find where the issue is. Is it possible that |
Instead of keeping blob data at Python side and risking it being not synced with the C side, always use C API to get blob length and data.
Thanks! |
Looking forward to the new version. |
I found that uharfbuzz has already bind the
hb_blob_get_length
API, but it doesn't use this API and instead returns the length usingreturn len(self._data)
. However, this leads to an issue: after creating a blob usinghb.Blob.from_file_path
, the length obtained bylen(blob)
orlen(blob.data)
is0
. Only after creatinghb.Face(blob)
, can the correct length be obtained throughlen(face.blob)
orlen(face.blob.data)
. This causes a significant amount of disk I/O when reading many files.result:
IO: