-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison of performance among different linearised FE bases #1023
Comments
Hi @wei3li , thank you very much for this! The only comment I have is that you are not using the Regardless, let me go through a first round of optimisations to get try to improve some of these results. There are a couple things in the current Macro elements that I already had flagged as not ideal. In particular, I had not optimised the autodiff. I think I can bring quite a lot of these times down. I'll let you know when I think we can run the benchmarks again. |
Update on Sep 10, 2024: macro-element implementation has been optimised by @JordiManyer. Three linearised bases are compared, each equipped with the same total integration points over the whole mesh. Trial order Matrix and vector assembly
Linear system solver
Gradient computation
Now for the macro-element-based linearised test space, only gradient computation is a bit slow. AppendixScript for the experimentsusing Gridap, Gridap.Adaptivity, BenchmarkTools
import FillArrays: Fill
import Printf: @printf
import Random
include("LinearMonomialBases.jl")
order, ncell = parse(Int, ARGS[1]), parse(Int, ARGS[2])
u((x, y)) = sin(3.2x * (x - y))cos(x + 4.3y) + sin(4.6 * (x + 2y))cos(2.6(y - 2x))
f(x) = -Δ(u)(x)
bc_tags = Dict(:dirichlet_tags => "boundary")
model = CartesianDiscreteModel((0, 1, 0, 1), (ncell, ncell))
ref_model = Adaptivity.refine(model, order)
reffe1 = ReferenceFE(lagrangian, Float64, order)
U = TrialFESpace(FESpace(model, reffe1; bc_tags...), u)
ΩH, Ωh = Triangulation(model), Triangulation(ref_model)
dΩ⁺ = Measure(ΩH, 15)
function Gridap.FESpaces._compute_cell_ids(
uh,
strian::BodyFittedTriangulation,
ttrian::AdaptedTriangulation)
bgmodel = get_background_model(strian)
refmodel = get_adapted_model(ttrian)
@assert bgmodel === get_parent(refmodel)
refmodel.glue.n2o_faces_map[end]
end
function compute_l2_h1_norms(u, dΩ)
l2normsqr = ∑(∫(u * u)dΩ)
∇u = ∇(u)
sqrt(l2normsqr), sqrt(l2normsqr + ∑(∫(∇u ⋅ ∇u)dΩ))
end
function extract_benchmark(bmark)
memo = bmark.memory / (1024^2)
tmid, tmean = median(bmark.times) / 1e6, mean(bmark.times) / 1e6
tmin, tmax = extrema(bmark.times) ./ 1e6
length(bmark.times), memo, tmid, tmean, tmin, tmax
end
function run_experiment(test_basis, target)
if contains(test_basis, "refined")
reffe2 = ReferenceFE(lagrangian, Float64, 1)
V = FESpace(ref_model, reffe2; bc_tags...)
dΩ = Measure(Ωh, order)
else
if contains(test_basis, "q1-iso-qk")
reffe2 = Q1IsoQkRefFE(Float64, order, num_cell_dims(model))
dΩ = LinearMeasure(ΩH, order)
elseif contains(test_basis, "macro")
rrule = Adaptivity.RefinementRule(QUAD, (order, order))
poly = get_polytope(rrule)
reffe = LagrangianRefFE(Float64, poly, 1)
sub_reffes = Fill(reffe, Adaptivity.num_subcells(rrule))
reffe2 = Adaptivity.MacroReferenceFE(rrule, sub_reffes)
quad = Quadrature(rrule, 2)
dΩ = Measure(ΩH, quad)
else
reffe2 = reffe1
dΩ = Measure(ΩH, 3order + 1)
end
V = FESpace(model, reffe2; bc_tags...)
end
a(u, v) = ∫(∇(v) ⋅ ∇(u))dΩ
l(v) = ∫(f * v)dΩ
op = AffineFEOperator(a, l, U, V)
eh = solve(op) - u
fem_l2err, fem_h1err = compute_l2_h1_norms(eh, dΩ⁺)
ipl_l2err, ipl_h1err = compute_l2_h1_norms(interpolate(u, U) - u, dΩ⁺)
function err_msg(err_type, fem_err, ipl_err)
"FEM $(err_type) error $(fem_err) is significantly larger than " *
"FE interpolation $(err_type) error $(ipl_err)!"
end
@assert fem_l2err < ipl_l2err * 2 err_msg("L2", fem_l2err, ipl_l2err)
@assert fem_h1err < ipl_h1err * 2 err_msg("H1", fem_h1err, ipl_h1err)
rh = interpolate(x -> 1 + x[1] + x[2], V)
function j(r)
∇r = ∇(r)
∫(r * r + ∇r ⋅ ∇r)dΩ
end
for _ in 1:3
solve(AffineFEOperator(a, l, U, V))
assemble_vector(Gridap.gradient(j, rh), V)
end
if contains(target, "assembly")
bmark = @benchmark AffineFEOperator($a, $l, $U, $V)
elseif contains(target, "solve")
bmark = @benchmark solve($op)
else
bmark = @benchmark assemble_vector(Gridap.gradient($j, $rh), $V)
end
@printf "| %s | %d | %.3f | %.3f | %.3f | %.3f | %.3f |\n" test_basis extract_benchmark(bmark)...
end
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 10
test_bases = ["standard", "refined", "macro", "q1-iso-qk"]
targets = ["assembly", "solve", "gradient"]
println("\norder $order $(ncell)x$(ncell) elements\n")
for target in targets
println("\n\n$(target) benchmark starting...\n")
print("| test basis | evaluation count | memory (MiB) |")
println(" time median (ms) | time mean (ms) | time min (ms) | time max (ms) |")
println("|:---:|:---:|:---:|:---:|:---:|:---:|:---:|")
for test_basis in test_bases
run_experiment(test_basis, target)
end
end |
Thanks, @wei3li ! I'll do a second round of optimisations with emphasis on the jacobian. Most probably its some autodiff stuff causing issues. |
@wei3li I haven't fixed this yet, but I think I know what it is. Just to confirm: Could you run your benchmarks with performance mode enabled? I.e start your REPL in the project where you run your benchmarks and do using Gridap
Gridap.Helpers.set_performance_mode() then restart julia and run your benchmarks again. If this solves the issue, it means some Edit: No straighforward way to avoid this, I'm afraid. Performance mode will disable this check, and everything should work great. |
Hi @JordiManyer, I reran the experiment with the The following are the results: Matrix and vector assembly
Linear system solver
Gradient computation
The AD part in Gridap is probably very involved... |
Background
There are currently five implementations of linearised bases in
Gridap
: the linear bases on the refined mesh, the macro-element bases lately implemented by @JordiManyer, the hqk-iso-q1 and qk-iso-q1 bases that @amartinhuertas implemented recently and the q1-iso-qk bases @wei3li implemented. This issue presents the experiment results comparing the performance of these bases. As a reference, the standard high-order test basis is also included in the tests.Experiment settings
adaptivity
with the last commitfedd16d
and branchrefined-discrete-models-linearized-fe-space
with the last commit3a1894b
.GenericMeasure
on the refined triangulation orCompositeMeasure
.Results
GenericMeasure
Matrix and vector assembly
Linear system solver
Gradient computation
CompositeMeasure
Matrix and vector assembly
Linear system solver
Gradient computation
GenericMeasure
Matrix and vector assembly
Linear system solver
Gradient computation
CompositeMeasure
Matrix and vector assembly
Linear system solver
Gradient computation
Notes & Comments
GenericMeasure
is used, the refined-based linear bases have the most outstanding performance, followed by qk-iso-q1 and q1-iso-qk bases, and then hqk-iso-q1 and macro bases. When theCompositeMeasure
is used, the ranking is the same except that the refined-based bases cannot be used.Appendix
Script for the experiments
The text was updated successfully, but these errors were encountered: