Speeding up Python-to-Python calls.

There a number of things we can do to speedup Python-to-Python calls, without changing the stack layout.

### Faster creation of interpreter frames

Our fastest Python-to-Python call is `_INIT_CALL_PY_EXACT_ARGS`
which is reasonably efficient, but could definitely be made faster.
There are few issues with it. 
* It contains a variable length loop.
* The inlined call to `_PyFrame_PushUnchecked` also contains a variable length loop.

We can make the loops fixed length by:
* Unconditionally copying `self_or_null`
* Only adjust the pointer, not the count if `self` is not NULL. 
  If `self_or_null` is `NULL` it will then be overwritten.
* Break `_INIT_CALL_PY_EXACT_ARGS` into two parts, one to initialize the arguments and
  one to `NULL` out the remaining locals. Both can be marked `replicate` to avoid the loop.

### Better optimization of other Py-to-Py calls in tier 2

We currently specialize the remaining Py-to-Py calls into "with defaults" and do not specialize
for "code complex parameters".
We should treat both the same in tier 1 as "CALL_PY", and expand the call sequence in tier2 to
produce an optimal sequence of instructions.

This will probably make no difference to T1 performance, the "with defaults" case will get a tiny bit slower and the other cases might be a bit faster.

### Remove `f_globals` and `f_builtins` from the interpreter frame

In tier 2, we have largely eliminated access to `f_globals` and `f_builtins`.
We can speedup calls, without slowing down access to globals, in tier 2 if we
were to remove these fields.
Doing this will slowdown access to globals in tier 1, however.

In order to get an overall speedup the ratio of tier 2 to tier 1 code will need to increase. 
Once the ratio of T2 to T1 code is 3:1 or better, it should be profitable to remove these fields.

### Faster checking of stack space

https://github.com/faster-cpython/ideas/issues/620



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up Python-to-Python calls. #661

Faster creation of interpreter frames

Better optimization of other Py-to-Py calls in tier 2

Remove `f_globals` and `f_builtins` from the interpreter frame

Faster checking of stack space

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speeding up Python-to-Python calls. #661

Description

Faster creation of interpreter frames

Better optimization of other Py-to-Py calls in tier 2

Remove f_globals and f_builtins from the interpreter frame

Faster checking of stack space

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Remove `f_globals` and `f_builtins` from the interpreter frame