-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3: fix parser #661
base: master
Are you sure you want to change the base?
sqlite3: fix parser #661
Conversation
- Make all database page types parseable - Add cell content overflow handling - Add UTF-16 text encoding support - Make free page list and overflow page lists accessible
database/sqlite3.ksy
Outdated
instances: | ||
len_page: | ||
value: 'len_page_mod == 1 ? 0x10000 : len_page_mod' | ||
pages: | ||
type: page(_index + 1, header.page_size * _index) | ||
repeat: expr | ||
repeat-expr: header.num_pages | ||
types: | ||
page: | ||
params: | ||
- id: page_number | ||
type: s4 | ||
- id: ofs_body | ||
type: s4 | ||
instances: | ||
page_index: | ||
value: 'page_number - 1' | ||
body: | ||
pos: ofs_body | ||
size: _root.header.page_size | ||
type: | ||
switch-on: '(page_index == _root.header.idx_lock_byte_page ? 0 : page_index >= _root.header.idx_first_ptrmap_page and page_index <= _root.header.idx_last_ptrmap_page ? 1 : 2)' | ||
cases: | ||
0: lock_byte_page(page_number) | ||
1: ptrmap_page(page_number) | ||
# TODO: Free pages and cell overflow pages are incorrectly interpreted as btree pages | ||
# This is unfortunate, but unavoidable since there's no way to recognize these types at | ||
# this point in the parser. | ||
2: btree_page(page_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@generalmimon no luck with lazy db.pages
this still loops all pages when i read db.pages[0]
@property
def pages(self):
if hasattr(self, "_m_pages"):
return self._m_pages
self._m_pages = []
for i in range(self.header.num_pages):
self._m_pages.append(
Sqlite3.Page(
(i + 1), (self.header.page_size * i), self._io, self, self._root
)
)
see also kaitai-io/kaitai_struct#133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this still loops all pages when i read
db.pages[0]
Yes, it creates the objects but does not parse them.
If you don't want to even create the empty objects (because the total memory usage of too many empty objects would be too high), you can provide an "unused" page
type and let the users of the parser instantiate it themselves for the page number they want (and even dispose the object afterwards to keep the memory usage low). This approach is essentially explained in https://stackoverflow.com/a/73332294/12940655.
KS unfortunately doesn't support truly unused types very well at the moment (i.e. when you define a type in the types
but don't use it absolutely anywhere), but this can be easily worked around by the if: false
trick as explained in the linked SO post.
This reverts commit 59110ce. kaitai-struct-compiler still generates an eager array
I tried to get kaitai struct working in lua and ended up here. It seems PR640 started work on fixing some of the problems but it was never merged. In any case I was able to parse my example db to some extent. It should be mentioned more clearly that since the pages are lazy loaded you will never see any actual pages, just the In any case, I am having issues with overflow pages. My example DB has page size of 4096 and page 2 is of type how to I access this offset content? I stepped through the parser and I my crude understanding is that there should be a When I look at page 6 it has 99 cells and each contains one property What's the state of this PR, can I do anything to help it along? edit: in fact, I would be happy to contribute a Lua tutorial, if I can get it working. |
abandoned
add example code (python, lua, ...) i needed this sqlite parser for my pysqlite3 to parse a partially-downloaded sqlite database kaitai does not support such lazy parsers kaitai is just a code-generator
overflow means, data is stored on multiple pages
cell.ofs_content is also used in connection._row_locations |
Yeah, I am trying to parse 'app size' sqlite dbs on limited hardware so lazy parsing isn't an issue. That's an interesting application you have, thanks for the pointers, I will have a look. |
continue #640
lazy pages based on #640 (comment)
rename fields from
*_index
toidx_*
vaguely based on style guide via #640 (comment)these are not physical offsets, so im not using
ofs_*
migration script:
will squash commits later