Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I set DDL_LOCK_TIMEOUT before resetPassword? #35

Open
yaooqinn opened this issue Jan 4, 2024 · 45 comments
Open

How can I set DDL_LOCK_TIMEOUT before resetPassword? #35

yaooqinn opened this issue Jan 4, 2024 · 45 comments
Assignees
Labels
bug Something isn't working

Comments

@yaooqinn
Copy link

yaooqinn commented Jan 4, 2024

Hi @gvenzl

The Apache Spark Docker Integration tests sometimes fail due to the failure of docker to start. It seems like the reason behind this is ORA-04021. To fix the problem, we can try to increase DDL_LOCK_TIMEOUT. Can we set this value using environment variables?

24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: CONTAINER: Resetting SYS and SYSTEM passwords.
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: 
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: User altered.
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: 
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: ALTER USER SYSTEM IDENTIFIED BY "***************"
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: *
24/01/04 04:22:09.284 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: ERROR at line 1:
24/01/04 04:22:09.285 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: ORA-65048: error encountered when processing the current DDL statement in
24/01/04 04:22:09.285 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: pluggable database FREEPDB1
24/01/04 04:22:09.285 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: ORA-04021: timeout occurred while waiting to lock object
24/01/04 04:22:09.285 docker-java-stream-568158377 INFO OracleIntegrationSuite: STDOUT: Help: https://docs.oracle.com/error-help/db/ora-65048/
yaooqinn added a commit to yaooqinn/oci-oracle-free that referenced this issue Jan 9, 2024
@gvenzl gvenzl self-assigned this Jan 28, 2024
@gvenzl gvenzl added the bug Something isn't working label Jan 28, 2024
@gvenzl
Copy link
Owner

gvenzl commented Jan 28, 2024

Hey @yaooqinn,

Thanks a lot for using these images for the Apache Spark integration tests!

Yeah, I have no problem increasing the timeout (already saw the PR, thanks!) but I'm curious as to why this error is happening.
Technically, not much should be happening on the database yet that would cause a DDL lock timeout.
Are the Spark jobs connecting via the SYS user at this stage already and creating some additional users or something like that?

@yaooqinn
Copy link
Author

Thank you @gvenzl.

Spark's docker IT uses docker-java library to start an oracle-free:23.3 container. After setting up some specs, it first uses startContainerCmd to launch the container and waits in a loop with inspectContainerCmd for the running state. If it detects the container is running, then it uses a time-limited loop with an interval of 1 second to wait for the JDBC connection to be established.

Technically, not much should be happening on the database yet that would cause a DDL lock timeout.

Would establishing the connection within an interval of 1 second be considered pushy and cause the DDL LOCK problem?

Are the Spark jobs connecting via the SYS user at this stage already and creating some additional users or something like that?

The SYSTEM user identifier is used for connections, not SYS. No additional users are being created.

@gvenzl
Copy link
Owner

gvenzl commented Feb 5, 2024

Hey @yaooqinn,

Thanks a lot for the background, this is interesting.

Technically, not much should be happening on the database yet that would cause a DDL lock timeout.

Would establishing the connection within an interval of 1 second be considered pushy and cause the DDL LOCK problem?

No, an interval of 1 second shouldn't be a problem at all, and actually, it cannot cause this error. ORA-04021: timeout occurred while waiting to lock object means that something has a lock on the table while something else is trying to change it. If there is no connection to the database yet, the connection cannot hold a lock.

Are the Spark jobs connecting via the SYS user at this stage already and creating some additional users or something like that?

The SYSTEM user identifier is used for connections, not SYS. No additional users are being created.

Hm, even more peculiar. Given that the SYSTEM password reset fails with this error, I can't see how anything in the test would cause that error, as no test would be able to logon due to the not-yet-reset password.

The more I think about this, the more I think we have found some sort of race condition or bug here. Is there any way for me to see what the regression tests execute, what's done inside inspectContainerCmd and perhaps for me to clone the repo and try to reproduce this myself?

@yaooqinn
Copy link
Author

yaooqinn commented Feb 5, 2024

Thank you @gvenzl.

Here is I recent example I have found:
https://github.com/apache/spark/actions/runs/7749877156/job/21135223061

To reproduce this in GitHub Actions, you can fork the spark repo, and the Run / Run Docker integration tests job will run in your forked repo by push events.

To reproduce locally,

ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-free:23.3 ./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly *OracleIntegrationSuite"

Here is a detailed guide you may refer to: https://github.com/apache/spark/tree/master/connector/docker-integration-tests#readme

FYI, the code for docker container initialization can be found at:
https://github.com/apache/spark/blob/e3fd64b96765dc04994798e79161026bd1361230/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala#L182

@yaooqinn
Copy link
Author

yaooqinn commented Feb 5, 2024

When the GitHub action job fails, you can found the error log on the Summary page , such as, unit-tests-log-docker-integration--17-hadoop3-hive2.3

@gvenzl
Copy link
Owner

gvenzl commented Feb 5, 2024

Thanks a lot, @yaooqinn!

Let me give this a spin and take a look what's going on here.

@yaooqinn
Copy link
Author

yaooqinn commented Feb 5, 2024

Thank you very much for the help @gvenzl

@gvenzl
Copy link
Owner

gvenzl commented Feb 5, 2024

Ok, I got it running locally.
To no big surprise, so far I haven't managed to reproduce it yet.

However, I have looked through a bunch of failed tests and seem to find a lot more runs that simply time out with:

[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (7 minutes, 31 seconds)
[info] The code passed to eventually never returned normally. Attempted 418 times over 7.002120891733333 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.117 port 43305. (CONNECTION_ID=7sVBltRxQr+1iXsn3hBBtQ==)

I'll keep digging.

@yaooqinn
Copy link
Author

yaooqinn commented Feb 5, 2024

Besides the console log, much richer information, including those from docker container, could be found at connector/docker-integration-tests/target/unit-tests.log

@gvenzl
Copy link
Owner

gvenzl commented Feb 7, 2024

Thanks a lot, @yaooqinn,

Btw, I don't think that the workaround is an actual workaround.

According to

https://github.com/gvenzl/spark/blob/07f1341fac6ff23f1c5a30c87a0d9946ca794e61/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala#L47-L59

The DDL_LOCK_TIMEOUT is set as an init script which is executed after the initial ALTER USER, i.e. the workaround sets the DDL_LOCK_TIMEOUT after the two User altered. statements:

ORACLE instance started.

Total System Global Area 1603679416 bytes
Fixed Size		    5313720 bytes
Variable Size		  738197504 bytes
Database Buffers	  855638016 bytes
Redo Buffers		    4530176 bytes
Database mounted.
Database opened.

CONTAINER: Resetting SYS and SYSTEM passwords.

User altered.


User altered.


CONTAINER: Executing user-defined scripts...

CONTAINER: running /docker-entrypoint-initdb.d/install.sql ...

Session altered.


User altered.

CONTAINER: DONE: running /docker-entrypoint-initdb.d/install.sql

CONTAINER: DONE: Executing user-defined scripts.


#########################
DATABASE IS READY TO USE!
#########################

Yet, the original error occurred during the second User altered. and hence before the init scripts are executed:

CONTAINER: Resetting SYS and SYSTEM passwords.

User altered.

ALTER USER SYSTEM IDENTIFIED BY "***************"
*
ERROR at line 1:
ORA-65048: error encountered when processing the current DDL statement in
pluggable database FREEPDB1
ORA-04021: timeout occurred while waiting to lock object
Help: https://docs.oracle.com/error-help/db/ora-65048/

@gvenzl
Copy link
Owner

gvenzl commented Feb 7, 2024

Hey @yaooqinn,

Is there any way to see or extract the container logs in cases such as this one:

[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (7 minutes, 31 seconds)
[info]   The code passed to eventually never returned normally. Attempted 418 times over 7.002120891733333 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.117 port 43305. (CONNECTION_ID=7sVBltRxQr+1iXsn3hBBtQ==)
[info]   [https://docs.oracle.com/error-help/db/ora-12541/.](https://docs.oracle.com/error-help/db/ora-12541/) (DockerJDBCIntegrationSuite.scala:189)
[info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:
[info]   at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219)
[info]   at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226)
[info]   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313)
[info]   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.eventually(DockerJDBCIntegrationSuite.scala:99)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:189)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)

@yaooqinn
Copy link
Author

@gvenzl. As you mentioned earlier, the workaround on the Spark side was not helpful at all.

Is there any way to see or extract the container logs in cases such as this one:

Additional logs can be found in connector/docker-integration-tests/target/unit-tests.log
Or, you can keep the container running after tests finished by this setting

@gvenzl
Copy link
Owner

gvenzl commented Feb 25, 2024

Thanks @yaooqinn,

Unfortunately, neither of these are very helpful.

My latest suspicion is that the runner simply runs out of either horsepower, i.e. CPU, or memory.
When I ran these tests locally on a VM with only 4G of memory, it wasn't pretty :)
So I was wondering what would make the database not be able to spin up a listener or take forever to run a simple ALTER USER command.
Nothing really should, as no activity on the DB has been executed to that stage, it's just starting up. So I was wondering whether the runner is simply overloaded.

From what I've found out, these runners come with 4 CPUs only, and 16 GB memory, which ought to be enough to run these tests, unless there are some executed in parallel.
I've added a couple of checks for the memory and CPU before the stage where no tests have been run yet. Obviously, that's not very helpful but at least it shows what the runner has available: https://github.com/gvenzl/spark/actions/runs/8015324674/job/21895352666 (see Run --> Run Docker integration tests --> Free memory & CPU stats staged)

It seems like running these tests eats up about 3 - 3.5GB of RAM:

Here is the output from a run with ORACLE_DOCKER_IMAGE_NAME=gvenzl/oracle-free:23.3 ./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly *OracleIntegrationSuite":

[gvenzl@localhost ~]$ vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 13214936   1628 2454320    0    0  1986  4577 1027  798 13 14 70  3  0
 3  0      0 12933988   1628 2467436    0    0  4359     0 2555 1986 12  8 79  1  0
 3  0      0 12704384   1628 2468092    0    0   221     0 5424 2850 49 20 30  1  0
 4  0      0 12404572   1628 2481192    0    0  4367   448 6482 3243 41 29 30  0  0
 2  0      0 12373512   1628 2481812    0    0   148     0 4817 2068 50 17 32  0  0
 2  0      0 12284284   1628 2481812    0    0     0    12 4194 1889 41 18 41  0  0
 3  0      0 11970348   1628 2481888    0    0    13  2160 6102 5542 46 29 25  0  0 <-- compilation starts up
 4  0      0 11901908   1628 2481928    0    0    19     0 5073 2778 47 19 33  0  0
 1  0      0 11700828   1628 2481928    0    0     0   887 4241 2020 36 17 47  0  0
 2  0      0 11265520   1628 2482028    0    0    35   936 4075 1928 35 14 51  0  0
 2  0      0 10875900   1628 2482032    0    0     0  1028 3649 1875 27 13 61  0  0
 6  0      0 10531076   1628 2483464    0    0   523   907 6572 7186 34 32 34  0  0
11  0      0 10333472   1628 2503632    0    0  6768   859 7948 6059 36 58  6  0  0
 6  0      0 10166876   1628 2542188    0    0 12805   181 7390 1878 22 46 31  0  0
 7  0      0 9856304   1628 2569000    0    0  9047    55 7244 3249 36 45 19  0  0
 1  0      0 9351240   1628 2583280    0    0  4615   146 5543 2739 27 22 51  1  0
 4  0      0 8777552   1628 2583280    0    0     0    53 2445  963 16  9 75  0  0
 1  0      0 8716568   1628 2583312    0    0     0     0 3453 1061 31 14 55  0  0
 2  0      0 8590052   1628 2583328    0    0     0    39 1838  788  8  7 85  0  0
 8  0      0 8224272   1628 2583620    0    0     0    60 6060 4179 45 30 25  0  0
 4  0      0 8077572   1628 2583620    0    0     0     3 4931 2805 47 32 21  0  0
 3  0      0 8002792   1628 2583624    0    0     0    75 4081 2088 32 16 51  0  0  <-- First Oracle DB docker container is started (note that a lot of memory is allocated to the Linux FS cache, not necessarily not taken away)
 5  0      0 6827184   1628 3387180    0    0  5581   768 5048 2785 33 40 26  1  0
 5  1      0 5646408   1628 4550088    0    0  1319 270095 4182  998 34 38 17 10  0
 4  1      0 5066112   1628 5098800    0    0     0 355201 4023  807 38 46  2 13  0
 0  1      0 4844880   1628 5611496    0    0   821 225870 4557 2902 14 23 49 13  0
 0  0      0 4842696   1628 5611668    0    0  1931  3427 5922 7297  1 15 77  7  0
 1  0      0 4630312   1628 5820208    0    0   629 142492 4192 4286  2 14 68 16  0
 4  0      0 2965884   1628 7167516    0    0   561  1186 3750 2324 10 17 69  3  0
 1  0      0 2844004   1628 7167928    0    0     4  2294 5250 4316 34 23 38  6  0
 2  0      0 2797840   1628 7168176    0    0     0  2312 4987 4693 26 18 49  6  0
 3  0      0 2744224   1628 7168044    0    0     0   109 4059 2484 19 17 63  0  0
 9  0      0 2692996   1628 7168052    0    0     0   924 4673 3094 32 20 47  1  0
 5  0      0 2681664   1628 7168388    0    0     0   820 4772 3102 23 21 54  2  0
 3  0      0 2631880   1628 7168444    0    0    11    72 4755 3393 31 16 53  0  0
 3  0      0 2555164   1628 7168560    0    0     0   341 5438 4133 37 20 43  1  0
 3  0      0 2416756   1628 7168728    0    0    13 10673 5054 3783 37 21 42  0  0
 4  0      0 2289964   1628 7168852    0    0     0    88 5854 5425 25 20 54  0  0
 3  0      0 7295732   1628 2606544    0    0   165  1109 5357 5365 12 18 69  1  0 <-- second Oracle DB container starts up (again, deallocation and allocation from Linux FS cache)
 4  2      0 5536388   1628 4133824    0    0     8 132050 3862 1247 31 24 38  7  0
 5  0      0 4620440   1628 5035524    0    0     0 382680 3964  584 39 28  9 25  0
 0  1      0 4364196   1628 5613288    0    0     8 335960 3074  896 17 15 45 23  0
 0  0      0 4350664   1628 5613524    0    0  2255 109995 4864 5435  1 14 69 17  0
 0  0      0 4132684   1628 5822236    0    0   757  3105 5355 6371  1 13 79  6  0
 1  0      0 2514348   1628 7170088    0    0   991 33561 4869 4254  8 18 65  9  0
 3  0      0 2359032   1628 7170556    0    0     8  2082 4655 3460 26 20 50  4  0
 7  1      0 2280500   1628 7170844    0    0     0  2683 5875 5548 27 21 46  6  0
 9  0      0 2164488   1628 7171032    0    0     0   628 5245 3588 40 27 32  1  0
 4  0      0 2135056   1628 7171304    0    0     0   593 5164 3743 36 24 39  1  0
 0  0      0 13053436   1628 2607480    0    0     0 12603 4258 5729 10 15 74  1  0
 0  0      0 13052744   1628 2607448    0    0     0     0  125  150  0  1 99  0  0

With these tests taking about 3 - 3.5 GB of memory and Oracle DB taking about another 1.5 ~ 2 GB, an active run would worst case take about 5.5GB, which in itself shouldn't be an issue, unless other stuff runs in parallel on the machine.

Same with the CPU, the run queue is somewhere around nothing to 4/5 ish, again, not a biggy, unless other things run in parallel, perhaps.

What would be really interesting is if we could inject these resource verification steps while these tests are running.
Would you happen to know an easy way to do that?
Or do you know whether anything else is running in parallel on the same runner?

@gvenzl
Copy link
Owner

gvenzl commented Feb 25, 2024

I forgot to add that so far I was unable to reproduce this locally on my VM , another reason why I'm suspicious whether this is an issue of too many things running in parallel during the GHA executions.

@yaooqinn
Copy link
Author

Thank you for the detailed information.

Would you happen to know an easy way to do that?

Unfortunately, I don't know.

Or do you know whether anything else is running in parallel on the same runner?

There are no parallel test works being executed.

Given on my observations, the error in GHA always/coincidentally occured on the second OracleIntegrationSuite.

I forgot to add that so far I was unable to reproduce this locally on my VM , another reason why I'm suspicious whether this is an issue of too many things running in parallel during the GHA executions.

The issue can be reproduced my local Apple M2/colima docker context, it should not be GHA-specific.

@yaooqinn
Copy link
Author

If the load is the root cause, changing the ddl lock timeout seems to be a positive change https://github.com/gvenzl/oci-oracle-free/pull/37?

@yaooqinn
Copy link
Author

Would you happen to know an easy way to do that?

I have discovered that we can include a health check to ensure that we achieve this goal. https://github.com/docker-java/docker-java/pull/2244/files

@gvenzl
Copy link
Owner

gvenzl commented Mar 3, 2024

Hey @yaooqinn,

Thanks, I wasn't aware that you can replicate it in your local environment.
Then let me take back my theory regarding the resource consumption. I think you would notice if your laptop would come to a grinding halt and it should have more than enough horsepower to run tests even in parallel.

The DDL_LOCK_TIMEOUT won't achieve anything, unfortunately.
The error Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.117 port 43305. (CONNECTION_ID=7sVBltRxQr+1iXsn3hBBtQ==) means that the program couldn't even connect to the database.
The question remains as to why.

When you encounter this error locally, could you please provide me with the log from the currently running Oracle Database container via (docker logs <container name>)?

I have still not managed to reproduce this issue at my end, but will continue to try.

I have discovered that we can include a health check to ensure that we achieve this goal. https://github.com/docker-java/docker-java/pull/2244/files

Hm, I'm not sure what exactly that addition does but it sounds like it only provides for a start interval delay for running the healthcheck. What we need really is the container's log from when that failure ocurrs, do you think this will provide us with that?

Thanks,

@yaooqinn
Copy link
Author

yaooqinn commented Mar 5, 2024

The DDL_LOCK_TIMEOUT won't achieve anything, unfortunately.
The error Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.117 port 43305. (CONNECTION_ID=7sVBltRxQr+1iXsn3hBBtQ==) means that the program couldn't even connect to the database.
The question remains as to why.

It seems that ORA-12541 can be ignored, the normal ones also include such errors

@gvenzl
Copy link
Owner

gvenzl commented Mar 5, 2024

Ok, although I saw test failures on the official Spark repository actions with this error.

In any case, if you could provide me with the container log, it would already help tremendously.
Even more so if you could also copy the following file out from the container (the database main log): /opt/oracle/diag/rdbms/free/FREE/trace/alert_FREE.log

@yaooqinn
Copy link
Author

yaooqinn commented Mar 5, 2024

24/03/05 14:05:37.470 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: starting up...
24/03/05 14:05:37.470 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: first database startup, initializing...
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: uncompressing database data files, please wait...
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: done uncompressing database data files, duration: 52 seconds.
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: starting up Oracle Database...
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: LSNRCTL for Linux: Version 23.0.0.0.0 - Production on 05-MAR-2024 06:02:01
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Copyright (c) 1991, 2023, Oracle.  All rights reserved.
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Starting /opt/oracle/product/23c/dbhomeFree/bin/tnslsnr: please wait...
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: TNSLSNR for Linux: Version 23.0.0.0.0 - Production
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: System parameter file is /opt/oracle/product/23c/dbhomeFree/network/admin/listener.ora
24/03/05 14:05:37.483 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Log messages written to /opt/oracle/diag/tnslsnr/4a50261d8539/listener/alert/log.xml
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_FREE)))
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC_FOR_FREE)))
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: STATUS of the LISTENER
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ------------------------
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Alias                     LISTENER
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Version                   TNSLSNR for Linux: Version 23.0.0.0.0 - Production
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Start Date                05-MAR-2024 06:02:01
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Uptime                    0 days 0 hr. 0 min. 0 sec
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Trace Level               off
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Security                  ON: Local OS Authentication
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: SNMP                      OFF
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Default Service           FREE
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Listener Parameter File   /opt/oracle/product/23c/dbhomeFree/network/admin/listener.ora
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Listener Log File         /opt/oracle/diag/tnslsnr/4a50261d8539/listener/alert/log.xml
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Listening Endpoints Summary...
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_FREE)))
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: The listener supports no services
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: The command completed successfully
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ORACLE instance started.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Total System Global Area 1603679416 bytes
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Fixed Size		    5313720 bytes
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Variable Size		  738197504 bytes
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Database Buffers	  855638016 bytes
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Redo Buffers		    4530176 bytes
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Database mounted.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Database opened.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: Resetting SYS and SYSTEM passwords.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: User altered.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: User altered.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: Executing user-defined scripts...
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: running /docker-entrypoint-initdb.d/install.sql ...
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Session altered.
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ALTER USER SYSTEM IDENTIFIED BY "Th1s1sThe0racle#Pass"
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: *
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ERROR at line 1:
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ORA-65048: error encountered when processing the current DDL statement in
24/03/05 14:05:37.484 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: pluggable database FREEPDB1
24/03/05 14:05:37.500 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ORA-04021: timeout occurred while waiting to lock object
24/03/05 14:05:37.502 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Help: https://docs.oracle.com/error-help/db/ora-65048/
24/03/05 14:05:37.503 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.504 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.505 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: DONE: running /docker-entrypoint-initdb.d/install.sql
24/03/05 14:05:37.506 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.507 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: DONE: Executing user-defined scripts.
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: #########################
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: DATABASE IS READY TO USE!
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: #########################
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ####################################################################
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: The following output is now from the alert_FREE.log file:
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ####################################################################
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Dumping current patch information
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ===========================================================
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Unable to obtain current patch information due to error: 20013
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: ===========================================================
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: FREEPDB1(3):Autotune of undo retention is turned on.
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 2024-03-05T06:03:11.692486+00:00
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: FREEPDB1(3):Opening pdb with Resource Manager plan: DEFAULT_PLAN
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 2024-03-05T06:03:15.503934+00:00
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Completed: Pluggable database FREEPDB1 opened read write
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Completed: ALTER DATABASE OPEN
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: 2024-03-05T06:04:09.299018+00:00
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Thread 1 advanced to log sequence 14 (LGWR switch),  current SCN: 2337234
24/03/05 14:05:37.509 docker-java-stream-500438131 INFO OracleIntegrationSuite: STDOUT: Current log# 1 seq# 14 mem# 0: /opt/oracle/oradata/FREE/redo01.log

@yaooqinn
Copy link
Author

yaooqinn commented Mar 5, 2024

The Java API was used to collect the logs above, which are expected to be the same as the output of the docker logs command.

@gvenzl
Copy link
Owner

gvenzl commented Mar 30, 2024

Thanks a lot, @yaooqinn.

Is there any reason why you are resetting the SYSTEM password twice?
The SYSTEM password has already been reset (successfully) just about the user-defined scripts:

CONTAINER: Resetting SYS and SYSTEM passwords.

User altered.


User altered.


CONTAINER: Executing user-defined scripts...
...

@gvenzl
Copy link
Owner

gvenzl commented Mar 30, 2024

Sorry, I just found it here:

https://github.com/gvenzl/spark/blob/5fa42804326f41e5d107a787d96fd8d7bcf4a215/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala#L47-L56

That also confirms again that the workaround of increasing the DDL_LOCK_TIMEOUT to a higher number won't remedy the situation either, as the ALTER SESSION SET DDL_LOCK_TIMEOUT=30 has just been processed before the second ALTER USER SYSTEM IDENTIFIED ...

@yaooqinn
Copy link
Author

yaooqinn commented Apr 1, 2024

Hi @gvenzl

I have witnessed failures even with DDL_LOCK_TIMEOUT=30, indicating its ineffectiveness.

I have no idea how to fix this:(

@gvenzl
Copy link
Owner

gvenzl commented May 5, 2024

Hey @yaooqinn,

I chatted about this with my colleagues a bit and there is no easy way to diagnose this without getting trace files from the database to know where this is happening.
This would require some magic as part of the init script, basically, it would have to be a script that enables tracing and, when this error occurs, looks for the trace file and writes it to STDOUT so that we can see it via the container logs.

I can help craft such a init script, but a fair warning, it will probably require a couple of round trips.

However, before we go there, Oracle just released a new version of Oracle Database Free (23.4) last Thursday, and I updated my images last night.
So, in the slight chance that this was some internal issue that was fixed in 23.4 in the meantime, could we first try and have Apache Spark run against the latest gvenzl/oracle-free:23.4-slim image?
I don't know how much effort that is, ideally just a simple switch of one digit in https://github.com/apache/spark/blob/bf2e25459fe46ca2b1d26e1c98c873923fc135e1/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleDatabaseOnDocker.scala#L29 and nothing else.

@yaooqinn
Copy link
Author

yaooqinn commented May 6, 2024

Thank you @gvenzl

I have seen a PR apache/spark#46399 for bumping up to 23.4

@LucaCanali
Copy link

I recommend removing the code that sets the password in override def beforeContainerStart. As @gvenzl pointed out, this step is redundant because setting the environment variable ORACLE_PASSWORD already configures the SYSTEM password as well. I've conducted a preliminary test and confirmed that the functionality remains intact without this segment of code. If there are no objections, I'm ready to submit a pull request to Apache Spark to implement this change.

@LucaCanali
Copy link

LucaCanali commented May 15, 2024

I have created a PR for this , see https://issues.apache.org/jira/browse/SPARK-48289 basically reverting the extra password change and DDL_LOCK_TIMEOUT. It does look like we hit a bug / unwanted feature ;) in Oracle (maybe related to PDB?) that cannot be easily worked around with the attempted setting of DDL_LOCK_TIMEOUT (good try BTW). So my view is that the extra code can be removed as it creates extra space for issues. Our hope at this stage is that the issue is not there in Oracle Free 23.4. Otherwise we can keep investigating.

yaooqinn added a commit to apache/spark that referenced this issue May 16, 2024
…dundant SYSTEM password reset

### What changes were proposed in this pull request?
This pull request improves the Oracle JDBC tests by skipping the redundant SYSTEM password reset.

### Why are the changes needed?
These changes are necessary to clean up the Oracle JDBC tests.

This pull request effectively reverts the modifications introduced in [SPARK-46592](https://issues.apache.org/jira/browse/SPARK-46592) and [PR #44594](#44594), which attempted to work around the sporadic occurrence of ORA-65048 and ORA-04021 errors by setting the Oracle parameter DDL_LOCK_TIMEOUT.

As discussed in [issue #35](gvenzl/oci-oracle-free#35), setting DDL_LOCK_TIMEOUT did not resolve the issue. The root cause appears to be an Oracle bug or unwanted behavior related to the use of Pluggable Database (PDB) rather than the expected functionality of Oracle itself.

Additionally, with [SPARK-48141](https://issues.apache.org/jira/browse/SPARK-48141), we have upgraded the Oracle version used in the tests to Oracle Free 23ai, version 23.4. This upgrade should help address some of the issues observed with the previous Oracle version.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This patch was tested using the existing test suite, with a particular focus on Oracle JDBC tests. The following steps were executed:
```
export ENABLE_DOCKER_INTEGRATION_TESTS=1
./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46598 from LucaCanali/fixOracleIntegrationTests.

Lead-authored-by: Kent Yao <[email protected]>
Co-authored-by: Luca Canali <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
@gvenzl
Copy link
Owner

gvenzl commented May 18, 2024

Hey @LucaCanali, thanks for your help on this!

If this issue reoccurs, it would be great to have a reproducible test case, ideally outside of the Spark test runs via a "pure" Docker + scripts environment as that would make it easier for me to debug.

But hopefully, this is solved now.

@yaooqinn
Copy link
Author

yaooqinn commented May 20, 2024

Thanks @gvenzl and @LucaCanali

But hopefully, this is solved now.

I haven't seen that error in a long time since the update. Everything seems fine now.

@yaooqinn
Copy link
Author

Oops, the error is still there to make the test flakey https://github.com/yaooqinn/spark/actions/runs/9155473477/job/25167935142

@LucaCanali
Copy link

I can see the error:

[info] org.apache.spark.sql.jdbc.OracleIntegrationSuite *** ABORTED *** (5 minutes, 17 seconds)
[info]   The code passed to eventually never returned normally. Attempted 299 times over 5.00881802855 minutes. Last failure message: ORA-12541: Cannot connect. No listener at host 10.1.0.151 port 46019. (CONNECTION_ID=IN3RqPhUS3Kha0Wdff/dPg==)

@LucaCanali
Copy link

LucaCanali commented May 20, 2024

This issue might be related to port forwarding. Based on the error message, it appears that IP:PORT 10.1.0.151:46019 should be forwarded to the container's port 1521, where the listener runs. If this forwarding is not functioning correctly, it could explain the ORA-12541 error. Another possibility is that the listener did not start up properly within the container, though this seems less likely to me, unless there is a high load on the system, which could cause the Oracle startup process to be significantly delayed?

@yaooqinn
Copy link
Author

yaooqinn commented May 21, 2024

@LucaCanali It's not a port forwarding issue, DDL_LOCK_TIMEOUT errors it. You can find more information about the issue by referring to the above discussions

@LucaCanali
Copy link

@yaooqinn, I reviewed the logs from the following job run: https://github.com/yaooqinn/spark/actions/runs/9155473477/job/25167935142 is The error identified in the logs is ORA-12541: TNS:no listener, indicating that the client cannot connect to the Oracle service on host 10.1.0.151 at port 46019.

This error seems to be unrelated to the previously reported errors ORA-65048 and ORA-04021. ORA-04021, as previously discussed in this thread, is usually associated with a timeout waiting for a resource, such as a library cache lock (when changing the password for example). ORA-65048 typically relates to an issue with attempting to access an invalid or non-existent container within a pluggable database environment. These errors suggest different root causes compared to the current connectivity issue.

Additionally, could we gain insights into the current frequency of failures with the OracleIntegrationSuite? Are there statistics or logs available that detail these occurrences? Moreover, is there a way to assess if the testing infrastructure might be experiencing high CPU load, potentially leading to slow container image start times? This information could be crucial for identifying recurring patterns or persistent issues.

Thank you!

@yaooqinn
Copy link
Author

Hi @LucaCanali. I am sorry that I attached the wrong link as I was filing multiple PRs to spark repo at that time, it shall be this one - https://github.com/yaooqinn/spark/actions/runs/9155585688/job/25169246357

It's still ORA-65048 with ORA-04021 as the root cause

===== END OF CONTAINER LOGS FOR container Id: CreateContainerResponse(id=3c6690ffa72f630c79bdde515d9a6a9128dacc3fabae20df3713b670d0065d8d, warnings=[]) =====
24/05/20 09:20:40.045 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: starting up...
24/05/20 09:20:40.045 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: first database startup, initializing...
24/05/20 09:20:40.045 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: uncompressing database data files, please wait...
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: done uncompressing database data files, duration: 7 seconds.
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: starting up Oracle Database...
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: LSNRCTL for Linux: Version 23.0.0.0.0 - Production on 20-MAY-2024 09:15:46
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Copyright (c) 1991, 2024, Oracle.  All rights reserved.
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Starting /opt/oracle/product/23ai/dbhomeFree/bin/tnslsnr: please wait...
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: TNSLSNR for Linux: Version 23.0.0.0.0 - Production
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: System parameter file is /opt/oracle/product/23ai/dbhomeFree/network/admin/listener.ora
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Log messages written to /opt/oracle/diag/tnslsnr/3c6690ffa72f/listener/alert/log.xml
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_FREE)))
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC_FOR_FREE)))
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: STATUS of the LISTENER
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ------------------------
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Alias                     LISTENER
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Version                   TNSLSNR for Linux: Version 23.0.0.0.0 - Production
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Start Date                20-MAY-2024 09:15:46
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Uptime                    0 days 0 hr. 0 min. 0 sec
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Trace Level               off
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Security                  ON: Local OS Authentication
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: SNMP                      OFF
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Default Service           FREE
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Listener Parameter File   /opt/oracle/product/23ai/dbhomeFree/network/admin/listener.ora
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Listener Log File         /opt/oracle/diag/tnslsnr/3c6690ffa72f/listener/alert/log.xml
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Listening Endpoints Summary...
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_FREE)))
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: The listener supports no services
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: The command completed successfully
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ORACLE instance started.
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Total System Global Area 1603726344 bytes
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Fixed Size		    5360648 bytes
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Variable Size		  788529152 bytes
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Database Buffers	  805306368 bytes
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Redo Buffers		    4530176 bytes
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Database mounted.
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Database opened.
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.046 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: CONTAINER: Resetting SYS and SYSTEM passwords.
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: User altered.
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ALTER USER SYSTEM IDENTIFIED BY "Th1s1sThe0racle#Pass"
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: *
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ERROR at line 1:
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ORA-65048: error encountered when processing the current DDL statement in
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: pluggable database FREEPDB1
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: ORA-04021: timeout occurred while waiting to lock object
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: Help: https://docs.oracle.com/error-help/db/ora-65048/
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 
24/05/20 09:20:40.047 docker-java-stream-1531082131 INFO OracleIntegrationSuite: STDOUT: 

Additionally, could we gain insights into the current frequency of failures with the OracleIntegrationSuite?

As Spark runs specific CI jobs by identifying related code change, it depends on how often we change the JDBC connectors. I have been working on an umbrella https://issues.apache.org/jira/browse/SPARK-47361 that touches this part frequently, so based on my observation, the rate is very high

Are there statistics or logs available that detail these occurrences?

Unfortunately, No.

Moreover, is there a way to assess if the testing infrastructure might be experiencing high CPU load, potentially leading to slow container image start times?

ORA-04021 can be reproduced on local machines, not necessary has to be testing infrastructures like Github Actions. I'm not sure whether ORA-04021's related to high CPU load or not.

@LucaCanali
Copy link

Thank you for the additional info.
However I am a bit confused as https://github.com/yaooqinn/spark/actions/runs/9155585688/job/25169246357 reports ORA-12541: Cannot connect. No listener at host 10.1.0.45 port 45269 and not ORA-4021
I find strange that you should have an issue with ORA-12541 too.
I cannot reproduce those errors on my local installation machine so far.

@yaooqinn
Copy link
Author

Oh, I see. You only viewed the console log but not the unit-tests.log. The console log's level is OFF, so you can not see the underlying root cause. I noticed ORA-12541, but I ignored it because I thought you were referring to the 'unit-tests.log' as well. ORA-12541 is a client-side error, indicating that the database instance or container has been terminated due to ORA-04021 in this case.

@LucaCanali
Copy link

Thank you for the explanation. I can see that you had ORA-0421 now in https://github.com/yaooqinn/spark/actions/runs/9155585688/job/25169246357
I have been trying to reproduce the error locally by running the tests a few times, but it did not appear in my system

export ENABLE_DOCKER_INTEGRATION_TESTS=1
./build/sbt -Pdocker-integration-tests "docker-integration-tests/testOnly org.apache.spark.sql.jdbc.OracleIntegrationSuite"

I am now trying to reproduce the issue using Github actions.

@yaooqinn
Copy link
Author

@LucaCanali Feel free to ping me here or with a PR in Spark repo if you make any progress.

Thanks.

@LucaCanali
Copy link

I have been attempting to reproduce the issue using GitHub Actions. However, the OracleIntegrationSuite has been running successfully there as well, so far.

@gvenzl
Copy link
Owner

gvenzl commented Sep 30, 2024

Hi @yaooqinn, @LucaCanali,

@yaooqinn, is this still an issue at your end?

@yaooqinn
Copy link
Author

Yes. @gvenzl

@LucaCanali
Copy link

Hi Everyone,
I’m not sure if this impacts the issue, but I wanted to mention that Spark has updated to use Oracle Free 23.6 for testing. This change is detailed in SPARK-50951 and implemented in PR #49605.
Just thought this might be relevant to the ongoing discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants