Skip to content

Commit

Permalink
Import "Where Is It?" data - working draft
Browse files Browse the repository at this point in the history
doc update
  • Loading branch information
xy committed Jan 21, 2024
1 parent 8e131f4 commit 83ca1b3
Show file tree
Hide file tree
Showing 6 changed files with 416 additions and 154 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ Portable executable packages created with [PyInstaller](https://pyinstaller.org/

https://github.com/PJDude/librer/releases

## MAJORGEEKS review:
https://www.majorgeeks.com/files/details/librer.html

## SOFTPEDIA review:
https://www.softpedia.com/get/Others/File-CD-DVD-Catalog/Librer.shtml

## [Tutorial](./info/tutorial.md) ##

## Guidelines for crafting custom data extractors
Expand All @@ -40,6 +46,8 @@ Custom data extractor is a command that can be invoked with a single parameter -
## Portability
**librer** writes log files, configuration and record files in runtime. Default location for these files is **logs** and **data** subfolders of **librer** main directory.

## [Importing data from "Where Is It?"](./info/wii_import.md) ##

## Technical information
Record in librer is the result of a single scan operation and is shown as one of many top nodes in the main tree window. Contains a directory tree with collected custom data. It is stored as a single .dat file in librer database directory. Its internal format is optimized for security, fast initial access and maximum compression (just check :)) Every section is a python data structure serialized by [pickle](https://docs.python.org/3/library/pickle.html) and compressed separately by [Zstandard](https://pypi.org/project/zstandard/) algorithm. The record file, once saved, is never modified afterward. It can only be deleted upon request or exported. All record files are independent of each other. Fuzzy matching is implemented using the SequenceMatcher function provided by the [difflib](https://docs.python.org/3/library/difflib.html) module. Searching records is performed as a separate subprocess for each record. The number of parallel searches is limited by the CPU cores.

Expand Down
Binary file added info/wii_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added info/wii_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions info/wii_import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Librer allows to import data from "Where Is It?" software indirectly by reading it's xml report file.

### This feature is under development

How to import data ? You need to save information about all files catalogued by "Where Is It?"

in "Where Is It?":
- Start Report Generator Wizard
- on "Select report type" page select the Pre-defined report "List of all cataloged files and folders on selected disks"
![image info](./wii_01.png)

- In the Report Generator window, select export to "XML file". All options set as below:
![image info](./wii_02.png)

- clik Export and save XML file

in Librer use File menu action: 'Import "Where Is It?" xml ...'

## unregistered Where Is It? version issue
Unregistered version of WII seems to pollute its own report by replacing some data with ```*** DEMO ***``` string. Imported data from such report will be obviously incomplete, however the ```*** DEMO ***``` string seems to be inserted randomly, so in sequential writes it appears in different sections. After couple of exports, with a little luck pool of report files will contain all data scattered in different files. Import all that reports at once in single shot (!) to merge the data into complete dataset. To do that use multiselection on import dialog. Incomplete data will be ignored and will not apear in librer records.


293 changes: 289 additions & 4 deletions src/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@
if windows:
from signal import CTRL_C_EVENT

from re import compile as re_compile

from pickle import dumps,loads
from zstandard import ZstdCompressor,ZstdDecompressor
from pympler.asizeof import asizeof
Expand Down Expand Up @@ -288,6 +290,8 @@ def __init__(self,label='',scan_path=''):

self.zipinfo = {}

self.scanning_time=0

self.compression_level=0

self.creation_os,self.creation_host = f'{platform_system()} {platform_release()}',platform_node()
Expand Down Expand Up @@ -375,6 +379,22 @@ def compress_with_header_update_wrapp(data,datalabel):

compress_with_header_update_wrapp(self.header,'header')

if False:
with open(file_path+'.debug.txt', "w") as txt_file:
#txt_file.write(str(self.header) + '\n')

txt_file.write('-------------------------------\nheader:\n\n')
for k,v in vars(self.header).items():
txt_file.write(f' {k}:{v}\n')

txt_file.write('\n\n-------------------------------\nfilestructure:\n\n')
txt_file.write(str(self.filestructure))
txt_file.write('\n\n-------------------------------\nfilenames:\n\n')
txt_file.write(str(self.filenames))
if self.customdata:
txt_file.write('\n-------------------------------\ncustomdata\n')
txt_file.write(self.customdata)

self.prepare_info()

print_func(['save','finished'],True)
Expand Down Expand Up @@ -765,10 +785,14 @@ def tupelize_rec(self,scan_like_data,results_queue_put):
try:
(size,is_dir,is_file,is_symlink,is_bind,has_files,mtime) = items_list[0:7]

elem_index = 7
if has_files:
sub_dict = items_list[elem_index]
elem_index+=1
try:
elem_index = 7
if has_files:
sub_dict = items_list[elem_index]
elem_index+=1
except:
#print('f1 error')
sub_dict={}

try:
info_dict = items_list[elem_index]
Expand Down Expand Up @@ -1278,6 +1302,266 @@ def read_records_pre(self):
self.log.error('list read error:%s' % e )
return (0,0)

def get_wii_files_dict(self,import_filenames):
re_obj_item = re_compile(r'<ITEM ItemType="([^"]+)">')
re_obj_item_end = re_compile(r'/ITEM>')

re_obj_name = re_compile(r'<NAME>(.+)</NAME>')
re_obj_ext = re_compile(r'<EXT>(.+)</EXT>')
re_obj_size = re_compile(r'<SIZE>(.+)</SIZE>')
re_obj_date = re_compile(r'<DATE>(.+)</DATE>')
re_obj_disk_name = re_compile(r'<DISK_NAME>(.+)</DISK_NAME>')
re_obj_disk_type = re_compile(r'<DISK_TYPE>(.+)</DISK_TYPE>')
re_obj_disk_num = re_compile(r'<DISK_NUM>(.+)</DISK_NUM>')
re_obj_disk_location = re_compile(r'<DISK_LOCATION>(.+)</DISK_LOCATION>')
re_obj_path = re_compile(r'<PATH>(.+)</PATH>')
re_obj_time = re_compile(r'<TIME>(.+)</TIME>')
re_obj_crc = re_compile(r'<CRC>(.+)</CRC>')
re_obj_category = re_compile(r'<CATEGORY>(.+)</CATEGORY>')
re_obj_flag = re_compile(r'<FLAG>(.+)</FLAG>')
re_obj_desc = re_compile(r'<DESCRIPTION>(.+)</DESCRIPTION>')
re_obj_desc_begin = re_compile(r'<DESCRIPTION>(.*)')
re_obj_desc_end = re_compile(r'(.*)</DESCRIPTION>')

l=0
in_item=False
in_description=False

demo_str = '*** DEMO ***'

wii_paths_dict = {}
wii_path_tuple_to_data = {}

try:
filenames_set=set()
for import_filename in import_filenames:
with open(import_filename,"rt", encoding='utf-8', errors='ignore') as f:
for line in f:
if match := re_obj_item.search(line):
item={}
item['type']=match.group(1)
item['ext'] = ''
item['description'] = []

in_item=True
in_description=False


elif match := re_obj_item_end.search(line):
in_item=False
in_description=False

if item['disk_name']!=demo_str and item['path']!=demo_str and item['name']!=demo_str and item['ext']!=demo_str:
#path_splitted = [item['disk_name'] + ':'] + item['path'].strip('\\').split('\\') + [item['name']]
path_splitted = [item['disk_name']] + item['path'].strip('\\').split('\\') + [item['name'] + item['ext']]
path_splitted = [path_elem for path_elem in path_splitted if path_elem]

path_splitted_len = len(path_splitted)
path_splitted_tuple = tuple(path_splitted)

#print(f'{path_splitted=}')

next_dict = wii_paths_dict
for ps_i in range(path_splitted_len):
filename = path_splitted[ps_i]
filenames_set.add(filename)

if filename not in next_dict:
next_dict[filename] = {}

next_dict = next_dict[filename]

size = item['size']
is_dir = bool(item['type']=="Folder")
is_file = bool(item['type']=="File")
is_symlink=False
is_bind=False
has_files=False
mtime=1234

#scan_like_data tuple
sld_tuple = tuple([size,is_dir,is_file,is_symlink,is_bind,has_files,mtime])
#print(f'{path_splitted=} : {sld_tuple=}')

wii_path_tuple_to_data[path_splitted_tuple] = sld_tuple

elif match := re_obj_name.search(line):
item['name']=match.group(1)
elif match := re_obj_ext.search(line):
ext = match.group(1)
if ext!=demo_str:
item['ext']='.' + ext
elif match := re_obj_size.search(line):
try:
item['size']=int(match.group(1))
except:
#print(f'wrong size: {match.group(1)}')
item['size']=333222
elif match := re_obj_date.search(line):
item['date']=match.group(1)
elif match := re_obj_disk_name.search(line):
item['disk_name']=match.group(1)
elif match := re_obj_disk_type.search(line):
item['disk_type']=match.group(1)
elif match := re_obj_disk_num.search(line):
item['disk_num']=match.group(1)
elif match := re_obj_disk_location.search(line):
item['disk_loc']=match.group(1)
elif match := re_obj_path.search(line):
item['path']=match.group(1)
elif match := re_obj_time.search(line):
item['time']=match.group(1)
elif match := re_obj_crc.search(line):
item['crc']=match.group(1)
elif match := re_obj_category.search(line):
item['category']=match.group(1)
elif match := re_obj_flag.search(line):
item['flag']=match.group(1)
elif match := re_obj_desc.search(line):
item['desc']=match.group(1)
elif match := re_obj_desc_begin.search(line):
in_description=True
item['description'].append(match.group(1))
elif match := re_obj_desc_end.search(line):
in_description=False
item['description'].append(match.group(1))
item['description'] = tuple(item['description'])
elif in_description:
item['description'].append(line)
elif lstrip:=line.strip():
#pass
print('IGNORING:',lstrip)

l+=1

#<ITEM ItemType="Folder">
# <NAME>$Recycle.Bin</NAME>
# <SIZE>1918697428</SIZE>
# <DATE>2023-04-27</DATE>
# <DISK_NAME>c</DISK_NAME>
# <DISK_TYPE>Hard disk</DISK_TYPE>
# <PATH>\</PATH>
# <DESCRIPTION><![CDATA[*** DEMO ***]]></DESCRIPTION>
# <DISK_NUM>1</DISK_NUM>
# <TIME>12:55:08</TIME>
# <CRC>0</CRC>
#</ITEM>

return filenames_set,wii_path_tuple_to_data,wii_paths_dict
except:
return {{},{},{}}

def wii_data_to_scan_like_data(self,path_list,curr_dict_ref,scan_like_data):
path_list_tuple = tuple(path_list)

anything = False
try:
for name,val in curr_dict_ref.items():

anything = True

dict_entry={}

sub_path_list = path_list + [name]
sub_path_list_tuple = tuple(sub_path_list)

try:
size,is_dir,is_file,is_symlink,is_bind,has_files,mtime = self.wii_path_tuple_to_data[sub_path_list_tuple]
except Exception as e1:
print(f'{e1=}')
#tylko topowy ?
size=1
is_dir = True
is_file = False
is_symlink = False
is_bind = False
mtime = 4568

if is_dir:
self.wii_data_to_scan_like_data(sub_path_list,val,dict_entry)

if dict_entry:
has_files = True
else:
has_files = False

temp_list_ref = scan_like_data[name] = [size,is_dir,is_file,is_symlink,is_bind,has_files,mtime]

if dict_entry:
temp_list_ref.append(dict_entry)

except Exception as e:
print('wii_data_to_scan_like_data error:',e)
pass

return anything

def import_records_wii_scan(self,import_filenames):
self.log.info(f'import_records_wii:{",".join(import_filenames)}')

demo_str = '*** DEMO ***'

quant = len(import_filenames)

filenames_set,wii_path_tuple_to_data,wii_paths_dict = self.get_wii_files_dict(import_filenames)

quant_files,quant_folders = 0,0
for k,v in wii_path_tuple_to_data.items():
size,is_dir,is_file,is_symlink,is_bind,has_files,mtime = v
if is_dir:
quant_folders+=1
elif is_file:
quant_files+=1

return quant_files,quant_folders,filenames_set,wii_path_tuple_to_data,wii_paths_dict

def import_records_wii_do(self,quant_files,quant_folders,filenames_set,wii_path_tuple_to_data,wii_paths_dict,update_callback):
import_res=[]

self.wii_path_tuple_to_data = wii_path_tuple_to_data
self.wii_paths_dict=wii_paths_dict

scan_like_data={}
self.wii_data_to_scan_like_data([],self.wii_paths_dict,scan_like_data)

new_record = self.create()
new_record.filenames = tuple(sorted(list(filenames_set)))
new_record.header.label = 'WII import'
new_record.header.scan_path = 'WII import'
new_record.filenames_helper = {fsname:fsname_index for fsname_index,fsname in enumerate(new_record.filenames)}

new_record.header.quant_files = quant_files
new_record.header.quant_folders = quant_folders

##################################
size,mtime = 0,0
is_dir = True
is_file = False
is_symlink = False
is_bind = False
has_cd = False
has_files = True
cd_ok = False
has_crc = False

new_record.header.references_names=0
new_record.header.references_cd=0

code = LUT_encode[ (is_dir,is_file,is_symlink,is_bind,has_cd,has_files,cd_ok,has_crc,False,False) ]
new_record.filestructure = ('',code,size,mtime,new_record.tupelize_rec(scan_like_data,print))

new_file_path = sep.join([self.db_dir,f'wii.{int(time())}.dat'])
print(f'{new_file_path=}')

new_record.save(print,file_path=new_file_path)

update_callback(new_record)

if import_res:
return '\n'.join(import_res)
else:
return None

def import_records(self,import_filenames,update_callback):
self.log.info(f"import {','.join(import_filenames)}")

Expand Down Expand Up @@ -1510,6 +1794,7 @@ def find_results_clean(self):
stdout_files_cde_size_sum=0

########################################################################################################################

def create_new_record(self,temp_dir,update_callback):
self.log.info(f'create_new_record')
self_log_info = self.log.info
Expand Down
Loading

0 comments on commit 83ca1b3

Please sign in to comment.