Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to undefined reference #12

Open
Cody-G opened this issue Jan 3, 2016 · 8 comments
Open

Access to undefined reference #12

Cody-G opened this issue Jan 3, 2016 · 8 comments

Comments

@Cody-G
Copy link

Cody-G commented Jan 3, 2016

When using multiple workers I am getting this error fairly often. When I just try running the same script again it works after one or two tries. My guess is that multiple workers are trying to access the image (in this case a subarray into a Bidirectional image) simultaneously and depending on the timing, the reference is sometimes unavailable. I'm not sure how to debug this. For now just running the script again seems to work, but I'm happy to try to fix this if someone can suggest a direction.

       From worker 2:  Worker 2 is working on 1
        From worker 4:  Worker 4 is working on 3
        From worker 3:  Worker 3 is working on 2
ERROR: LoadError: On worker 2:
UndefRefError: access to undefined reference
 in getindex at /home/cody/git/juliapackages_new/v0.4/OCPI/src/Bidirectional.jl:280
 in copy! at multidimensional.jl:582
 in getindex at subarray.jl:607
 in worker at /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterWorkerAperturesMismatch.jl:124
 in worker at /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterWorkerShell.jl:148
 in anonymous at multi.jl:907
 in run_work_thunk at multi.jl:645
 [inlined code] from multi.jl:907
 in anonymous at task.jl:63
 in remotecall_fetch at multi.jl:731
 [inlined code] from /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterDriver.jl:84
 in anonymous at task.jl:447

...and 2 other exceptions.

while loading /home/cody/git/codyfunc/jfunc/scripts/registration/new_registration_code/first_GC6f_fish/nonrigid_odd/nonrigid_odd_mismatch.jl, in expression starting on line 155
@timholy
Copy link
Member

timholy commented Jan 3, 2016

Is there any chance you can capture this with --inline=no? That will slow things down tremendously, so it may not be practical. Unless you can trap it using a really tiny sub-image?

@timholy
Copy link
Member

timholy commented Jan 3, 2016

I should clarify that part of the reason I asked this is that line 280 is this line, which doesn't make much sense as the source of the problem.

@Cody-G
Copy link
Author

Cody-G commented Jan 3, 2016

Oops, I had some local changes (just comments, so not relevant to the error). So for me line 280 corresponds to this line.

I'm also running it without inlining now to try to catch the error in case it's helpful.

@Cody-G
Copy link
Author

Cody-G commented Jan 4, 2016

Okay I did trigger the error without inlining. Most, but not all, of the line numbers are the same. I also noticed some warnings which I think are unrelated, but I'm pasting them too:

WARNING: Module Reexport uuid did not match cache file
WARNING: Module Reexport uuid did not match cache file
WARNING: deserialization checks failed while attempting to load cache from /home/cody/git/juliapackages_new/lib/v0.4/RegisterMismatchCuda.ji
WARNING: deserialization checks failed while attempting to load cache from /home/cody/git/juliapackages_new/lib/v0.4/RegisterMismatchCuda.ji
WARNING: Module Reexport uuid did not match cache file
WARNING: deserialization checks failed while attempting to load cache from /home/cody/git/juliapackages_new/lib/v0.4/RegisterMismatchCuda.ji
        From worker 2:  Worker 2 is working on 1
        From worker 3:  Worker 3 is working on 2
        From worker 4:  Worker 4 is working on 3
ERROR: LoadError: On worker 2:
UndefRefError: access to undefined reference
 in getindex at /home/cody/git/juliapackages_new/v0.4/OCPI/src/Bidirectional.jl:280
 in copy! at multidimensional.jl:582
 in getindex at subarray.jl:607
 in worker at /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterWorkerAperturesMismatch.jl:124
 in worker at /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterWorkerShell.jl:148
 in anonymous at multi.jl:907
 in run_work_thunk at multi.jl:645
 [inlined code] from multi.jl:907
 in anonymous at task.jl:63
 in remotecall_fetch at multi.jl:731
 in remotecall_fetch at multi.jl:734
 [inlined code] from /home/cody/git/juliapackages_new/v0.4/BlockRegistrationScheduler/src/RegisterDriver.jl:84
 in anonymous at task.jl:447

...and 2 other exceptions.

@timholy
Copy link
Member

timholy commented Jan 4, 2016

That line number makes much more sense---it suggests that one of the fields of the ArrayZInterp is undefined. But the weird thing is neither ArrayZInterp nor ArrayZSeq has an inner constructor, and without an inner constructor I don't think it's possible to have an undefined field. So I'm still puzzled.

How about trying this: define a copy! method with signature

copy!{T}(dest::AbstractArray{T,4}, src::ArrayZInterp{T})

and first check that this is what gets called (e.g., make the body something like error("here we are!") and check that the backtrace is the same starting with that getindex at subarray.jl:607). Then you should be able to use this to learn more about what's happening. If you haven't used it before, isdefined could be quite handy, e.g., isdefined(src, :data) and isdefined(src, :framenumber).

Long-term, we might want to define such a copy! method anyway, since it could be more efficient than looping over scalar indexes. But that's a topic for another day.

@timholy
Copy link
Member

timholy commented Jan 4, 2016

Just noticed that it's quite possible that --inline=no doesn't get passed to the workers. You could use

addprocs(n; exeflags=`--inline=no`)

@timholy
Copy link
Member

timholy commented Jan 4, 2016

I should also say that now the backtrace makes sense, so running with --inline=no is a moot point.

One other thought: try changing those types to immutable. They can't have undefined fields.

@timholy
Copy link
Member

timholy commented May 31, 2017

I wonder if this could by any chance be the same as #36, with a different error message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants