Opened 6 years ago
Last modified 3 years ago
#5141 new defect
Issues with the solution of fluid system models with low delta-p components
Reported by: | Francesco Casella | Owned by: | Lennart Ochel |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Run-time | Version: | |
Keywords: | Cc: | Bernhard Bachmann, Karim Adbdelhak, Andrea Bartolini |
Description (last modified by )
Consider the attached test case. Please note you need the latest version of ThermoPower (https://github.com/casella/ThermoPower/commit/e3d010caa7912e66dea9f17e862ff5a51cb38a0a [commit e3d010]) to avoid issues with min/max attributes.
The test case models a circuit with a pressure source and a pressure sink around atmospheric pressure. In between, steam flows through a valve, a volume, two series-connected pressure losses, another volume, and another valve.
At initialization, the flow is about 1 kg/s, and the total pressure difference between the two volumes, 10000 Pa, is most taken by the two valves, with 5000 Pa each. The two pressure losses take only 10 Pa in Test1
; 10 and 50 Pa in Test2
. Since there is no volume in-between, an algebraic loop is formed, with the mass flow rate through the two components pressDrop1.w
as tearing variable.
This situation is representative of hydraulic and pneumatic circuits, where components with relatively high absolute pressure states are connected by flow models with very small quadratic pressure losses, and where direct connections between flow models without control volumes in-between can happen because of the connections of modules having a flow model directly attached to the fluid connector on each side. This situation is in fact very commonplace.
At time = 0.1 the two valves are completely closed, so that the inner circuit becomes isolated. Given the very small pressure losses between the two volumes, the expected physical behaviour is that the two volumes very quickly get to the same pressure, and the flow through the two series-connecteed pressure loss components becomes zero. As there is no inertia in the flow, the exact solution of the pressure transients is monotonous, with no oscillations.
The algebraic loop to determine pressDrop1.w
could be potentially critical when the flow rate approaches zero if a purely quadratic flow-dp relationship was used, because the Jacobian would become singular (i.e., zero in this case). However, a regularized flow-dp releationship has been used instead, which smoothly turns into a linear one with non-zero slope close to the origin, when the mass flow rate is below wnf
(0.01) times the nominal value, so the system is definitely not singular, even though it's very sensitive to small changes of the pressure state variables.
Once the equilibrium condition is reached, implicit adaptive step-size solvers such as dassl or ida should be able to increase the time step dramatically.
Unfortunately, for some reason this doesn't happen. If you try running Test1
, which has a tolerance of 1e-6 and both pressure losses with a nominal value of 10 Pa, once the valves are closed the pressures quickly get close to each other, but then a permanent numerical oscillation of the mass flow rates pressDrop1.w
and pressDrop2.w
with amplitude about 0.15 kg/s (which is 15% of the nominal value!) is triggered on the mass flow rate through the pressure losses. This oscillation goes on forever and does not allow the ODE solver to increase the step size above 0.2 ms, thus effectively killing the simulation performance.
Test2
produces the same situation with an asymmetric small pressure losses (nominal values 10 and 50 Pa, respectively). The frequency of the numerical oscillations is higher and the amplitude somewhat smaller, around 0.04 kg/s, but the final outcome is basically the same.
So, the first two questions are:
Q1: Why does this happen?
Q2: Why do I get an extremely slow, totally unphysical simulation result without the solver complaining at all about anything being suspicious, odd, or close to singular during this transient?
Trying to answer this question, I switched off equidistantTimeGrid
to better understand what the ODE solver does: Test3
has the same parameters as Test1
except for this detail. I was indeed very surprised to see the number of steps increasing from 550 to 3000, and the simulation time to increase from 0.6 s to 40 s(!). The oscillation amplitude decreased from 0.15 to 0.05, but the frequency increased dramatically. This leads to two more questions:
Q3. Why should
equidistantTimeGrid
affect the number of steps taken?
I understood that the solver takes whatever step length is necessary, and then the solution is re-sampled on the grid if equidistantTimeGrid
is enabled, but apparently I am missing something here. Please explain.
Q4. Why is the time step per successful step increase 10-fold when
equidistantTimeGrid
is activated?
Unfortunately profiling is currently broken (see #5030) so I can't see where most of the time is spent, but I suspect there's something wrong with the nonlinear system that solves for pressDrop1.w
For the record, simulating the model with Dymola leads to the same bad situation, so I guess this is really a fundamental issue with the causalization approach at the solution of this model.
Further decreasing the ODE solver tolerance to 1e-7 partially solves the problem, as demonstrated by Test4
. The mass flow rate shows some small undershoot below zero, a couple of well-damped oscillations, and then settles down to a constant value of 0.008, which is relatively small w.r.t. the nominal value of one (but definitely non negligible!), at which point the step size starts increasing dramatically. The mass flow rate is somwhat still wrong (it should be zero!) but at least the simulation is not bogged down by fast oscillations.
However, in some cases it is not really possible to tighen the tolerance so much, because other issues could arise, particularly in cases where the analytical Jacobian cannot be computed due to the complicated Media functions involved in the model, so that getting very high precision could become problematic. A significantly non-zero value of the flow in this case is non-physical and could lead to other issues with the simulation. Furthermore, these models usually have a lot of approximations, so seeking such a high precision in their time integration is really nonsense, in general. That one needs such a tight tolerance to avoid numerical issues is a sign of something odd going on.
I find this situation confusing for an end user. One would expect that reducing the tolerance increases the simulation time, not the other way round, and even more importantly, that a very tight tolerance is not essential for successful simulation.
Please note that in many practical applications, this kind of issues can actually cause the simulation to abort, making the simulation of fluid systems where some branches may occasionally be sealed off by closing valves a very critical task. Please note that the values of the parameters I selected are realistic (short pipes have very small dp's), so I'm in no way building an exotic test case just to stress the solver.
Last, but not least, if I switch the nonlinear solver to Kinsol, the simulations gets much slower, and a lot of messages such as
LOG_NLS_V | warning | kinsols line search did not convergence. Try without. LOG_NLS_V | warning | kinsols runs into issues retry with different configuration. LOG_NLS_V | warning | <p>Move forward with a less accurate solution.
show up, even though the simulation eventually reaches the end.
I guess the root cause of this issue is that the error control is performed on the absolute pressures (which are around 1e5), while the mass flow rate between the two volumes is very sensitive to much smaller delta-p changes of 0.001 or less, and very strongly nonlinear. Therefore the solver accepts steps which are perfectly fine in terms of relative accuracy of the states (-> absolute pressures) but which can still cause significant perturbations of the flow rate and possibly lead to convergence issues at the next time step, even though the state values change very little between steps. The problem is that once the previous step has been accepted (based on the relative error on the pressure states), there is no way to undo it if the next step has convergence issues: the solver can try to reduce the time step a lot, but it can't go backward in time.
According to this interpretation (which I believe in), the problem is not just that the system is ill-conditioned, but that it is also very strongly nonlinear. In fact, if I use linear pressure drop models with the same dp's, these problems do not show up at all.
From this point of view, one option is to increase the wnf
parameter of the pressure loss models, which makes the residual function of the algebraic loop much less strongly nonlinear close to zero (and with a larger Jacobian, which doesn't harm), thus helping the convergence. In fact, the empirical evidence is that higher values of wnf
on small pressure loss components help avoiding this problem. This strategy somehow saved my neck so far. However, it would be nice if we had robust ways of solving these systems without the need of resorting to these tricks.
One possible way of doing this would be to make a state variable change and use the absolute pressure of one volume and the pressure difference between the volumes as states, instead of the two absolute pressures. In this way, a reasonable tolerance like 1e-4 or 1e-5 on both states would avoid getting dp's large enough to cause significant mass flow rate oscillations. In fact, this approach works nicely, see Test5
, where the sought after solution is found very quickly and with high accuracy even with a relative tolerance of 1e-3(!), or Test6
, where the solution is found very quickly with the original accuracy of 1e-6.
The only problem here is that doing this in an object-oriented way is really not possible, because this delta-p spans several components - in which component should I define the p1 - p2
variable with the stateSelect.prefer
attribute? Also, a static choice of which dp's to use may not be always feasible, in a circuit where valves are opened and closed and parts of the circuit dynamically sealed off from the rest.
However, a smart enough back-end may automatically generate code to detect these situations and do the (dynamic) state variable change of its own initiative, selecting the right dp's as states to achieve robust and fast simulation. We may help the back-end doing this by re-using the branch-node mechanism of the overconstrained connectors. @bachmann, @karim.abdelhak, I guess this would be an interesting research topic for you.
Another possible point of view on this issue could be to consider this problem as "close to variable index". With a purely quadratic flow-dp relationship, the solution at dp = 0 would locally correspond to an index-2 problem, so if the linear regularization around zero flow is not strong enough, the problem becomes "almost index 2" when the valves are closed. Maybe this could give you some ideas how to attack the problem. However, this doesn't really include the "nonlinearity/convergence" factor which I believe plays an important role.
@wbraun, @bachmann, @karim.abdelhak, I would urge you to perform some analysis on this issue and try to come up with possible solutions. This issue has plagued many people doing fluid system modelling for 15+ years (not only myself) and it would be nice to finally get a robust solution based on sound theory, not just hacks. I'm available to help with all possible support on my side, because this is really important for us.
At the very least, if this is really a fundamental problem of the underlying equations, then we should issue some diagnostic message to the end-user to try avoiding it, e.g. by suggesting to reduce the tolerance or to make the small dp's less nonlinear close to zero flow.
Attachments (1)
Change History (6)
comment:1 by , 6 years ago
Description: | modified (diff) |
---|
by , 6 years ago
Attachment: | TestFlowLowDP.mo added |
---|
comment:2 by , 5 years ago
Milestone: | 1.14.0 → 1.16.0 |
---|
Releasing 1.14.0 which is stable and has many improvements w.r.t. 1.13.2. This issue is rescheduled to 1.16.0
comment:4 by , 4 years ago
Milestone: | 1.17.0 → 1.18.0 |
---|
Retargeted to 1.18.0 because of 1.17.0 timed release.
Test package