Tomcat应用程序中分析Connection Closed Exception

问题

我最近负责Java Web应用程序，代码已经编写完成。该应用程序获得适度的高流量，每天上午11点至下午3点的交通高峰时段。该应用程序使用Spring，JPA（Hibernate），MYSQL DB。 Spring已配置为使用tomcat jdbc连接池来建立与DB的连接。（post末尾的配置细节）

在过去几天，在应用程序的高峰负载时间内，由于tomcat对请求没有响应，应用程序一直在停止运行。它需要多次重启tomcat。

通过tomcat catalina.out日志，我注意到了很多

Caused by: java.sql.SQLException: Connection has already been closed. at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117) at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:109) at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:80) at com.sun.proxy.$Proxy28.prepareStatement(Unknown Source) at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:505) at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:423) at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139) at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1547) at org.hibernate.loader.Loader.doQuery(Loader.java:673) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236) at org.hibernate.loader.Loader.loadCollection(Loader.java:1994) ... 115 more

这些经常在崩溃之前出现。

在这些exception之前更进一步，我注意到在Connection Closedexception之前放弃了很多Connections。

 WARNING: Connection has been abandoned PooledConnection[com.mysql.jdbc.Connection@543c2ab5]:java.lang.Exception at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1065) at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:782) at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:618) at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188) at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128) at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:47) at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:423) at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:144) at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)

这些似乎经常出现在Connection Closedexception之前。这些似乎是日志即将崩溃的第一个症状。

分析

通过日志，我开始查看是否存在可能导致问题的任何连接池配置/ mysql配置。通过一些优秀的文章，展示了生产环境池的调整。链接1和2

通过这些文章，我注意到：

JHanik的文章（链接1）中的以下行提到了这一点

将abandonWhenPercentageFull的值设置为100意味着除非我们达到maxActive限制，否则不会将连接视为放弃。

我认为这在我的情况下可能很重要，因为我看到许多连接被抛弃了。
我的max_connections设置与推荐的设置不匹配（在链接2中）

mysql max_connections应该等于max_active + max_idle

我做了什么

因此，根据文章的建议，我做了以下两件事：

改变放弃当PercentageFull为100
在我的MYSQL服务器中，max_connections设置为500.将其增加到600在我的连接池设置中，max_active为200，max_idle为50.将其更改为max_active = 350，max_idle = 250

这没有帮助

第二天，在高峰时段进行了以下观察：

Tomcat没有下来。该应用程序在高峰时段保持不变。然而，性能变得越来越糟，然后应用程序几乎无法使用，即使它并没有真正下降。
虽然数据库连接池的大小已经增加，但它已经被完全利用了，我可以在一个点上看到350个与DB有效的连接。

最后，我的问题：

很明显，从应用服务器进行数据库连接的方式存在问题。所以我有两个方向来推进这个分析。

我的问题是我应该采取哪些措施？

1.问题不在于连接池设置。 代码是导致问题的原因

代码中可能存在未连接数据库连接的位置。这导致大量连接被打开。

代码使用GenericDao，它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例，该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。

那么，我应该在哪里寻找可能的连接泄漏？

2.问题在于连接pool / mysql配置参数。 但是，我提出的优化需要进一步调整

如果是，我应该看哪些参数？我是否应该收集一些数据来确定我的连接池的更合适的值。（例如，对于max_active，max_idle，max_connections）

附录：完整的连接池配置

对于OP而言，这已经非常晚了，但未来可能会对其他人有所帮助：

我在具有长期批处理作业的生产环境中遇到类似的问题。问题是如果您的代码需要的连接时间超过属性指定的时间：

name="removeAbandonedTimeout" value="60

你启用了：

然后它会在60秒后处理过程中断开连接。一种可能的解决方法（对我来说不起作用）是启用拦截器：

jdbcInterceptors="ResetAbandonedTimer"

这将为每次发生的读/写重置该连接的放弃计时器。不幸的是，在我的情况下，在读取/写入数据库之前，处理有时仍然需要比超时更长的时间。所以我被迫要么超过超时长度，要么禁用removeAbandonded（我选择了前一个解决方案）。

希望这可以帮助别人，如果他们遇到类似的东西！

我最近被要求调查为什么生产系统有时会下降。我想分享我的发现，因为它涉及事件的关联，以便将JVM tomcat应用程序与上面概述的JDBC问题一起实际崩溃应用程序。这是使用mysql作为后端，所以可能对这种情况最有用，但如果在另一个平台上遇到问题可能会相同。

通过简单地关闭连接并不意味着应用程序被破坏

这是在grails应用程序下，但将相对于所有JVM相关的应用程序：

tomcat/context.xml db配置，注意很小的db pool和removeAbandonedTimeout="10"你们我们想让事情破灭

每分钟都运行一次石英作业，而不是我觉得第一次尝试死的应用程序很重要：

 class Test2Job { static triggers = { cron name: 'test2', cronExpression: "0 0/1 * * * ?" } def testerService def execute() { println "starting job2 ${new Date()}" testerService.basicTest3() } }

现在我们的testService带有评论，请关注评论：

 def dataSource /** * When using this method in quartz all the jdbc settings appear to get ignored * the job actually completes notice huge sleep times compared to basicTest * strange and very different behaviour. * If I add Tester t = Tester.get(1L) and then execute below query I will get * connection pool closed error * @return */ def basicTest2() { int i=1 while (i<21) { def sql = new Sql(dataSource) def query="""select id as id from tester t where id=:id""" def instanceList = sql.rows(query,[id:i as Long],[timeout:90]) sleep(11000) println "-- working on ${i}" def sql1 = new Sql(dataSource) sql1.executeUpdate( "update tester t set t.name=? where t.id=?", ['aa '+i.toString()+' aa', i as Long]) i++ sleep(11000) } println "run ${i} completed" } /** * This is described in above oddity * so if this method is called instead you will see connection closed issues */ def basicTest3() { int i=1 while (i<21) { def t = Tester.get(i) println "--->>>> test3 t ${t.id}" /** * APP CRASHER - This is vital and most important * Without this declared lots of closed connections and app is working * absolutely fine, * The test was originally based on execRun() which returns 6650 records or something * This test query is returned in time and does not appear to crash app * * The moment this method is called and please check what it is currently doing. It is simply * running a huge query which go beyond the time out values and as explained in previous emails MYSQL states * * The app is then non responsive and logs clearly show application is broke */ execRun2() def sql1 = new Sql(dataSource) sleep(10000) sql1.executeUpdate("update tester t set t.name=? where t.id=?",['aa '+i.toString()+' aa', t.id]) sleep(10000) i++ } } def execRun2() { def query="""select new map (t as tester) from Tester t left join t.children c left join t.children c left join c.childrena childrena left join childrena.childrenb childrenb left join childrenb.childrenc childrenc , Tester t2 left join t2.children c2 left join t2.children c2 left join c2.childrena children2a left join children2a.childrenb children2b left join children2b.childrenc children2c where ((c.name like (:name) or childrena.name like (:name) or childrenb.name like (:name) or (childrenc is null or childrenc.name like (:name))) or ( c2.name like (:name) or children2a.name like (:name) or children2b.name like (:name) or (children2c is null or children2c.name like (:name)) )) """ //println "query $query" def results = Tester.executeQuery(query,[name:'aa'+'%'],[timeout:90]) println "Records: ${results.size()}" return results } /** * This is no different to basicTest2 and yet * this throws a connection closed error and notice it is 20 not 20000 * quite instantly a connection closed error is thrown when a .get is used vs * sql = new Sql(..) is a manuall connection * */ def basicTest() { int i=1 while (i<21) { def t = Tester.get(i) println "--- t ${t.id}" sleep(20) //println "publishing event ${event}" //new Thread({ // def event=new PurchaseOrderPaymentEvent(t,t.id) // publishEvent(event) //} as Runnable ).start() i++ } }

只有当查询花费的时间比预期的时间长，但必须有另一个元素时，查询本身必须在MYSQL上进行操作，即使它被杀死了。 MYSQL正在吃掉它处理它。

我想现在发生了什么

 job 1 - hits app -> hits mysql -> (9/10 left) {timeout} -> app killed -> mysql running (9/10) job 2 - hits app -> hits mysql -> (8/10 left) {timeout} -> app killed -> mysql running (8/10) ..... job 10 - hits app -> hits mysql -> (10/10 left) {timeout} -> app killed -> mysql running (10/10) job 11 - hits app ->

如果到目前为止job1尚未完成，那么我们在池中没有任何东西，应用程序现在只是破坏了... jdbc错误抛出等等。没关系，如果它在崩溃后完成..

您可以通过检查mysql来监视正在发生的事情它似乎运行的时间更长，这违背了他们建议的这个值应该做的事情，但是再次可能这不是真的基于任何这个并且与问题有关别处。

虽然测试发现有两种状态：发送数据/发送给客户端：

 | 92 | root | localhost:58462 | test | Query | 80 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 95 | root | localhost:58468 | test | Query | 207 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 96 | root | localhost:58470 | test | Query | 147 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 97 | root | localhost:58472 | test | Query | 267 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 98 | root | localhost:58474 | test | Sleep | 18 | | NULL | | 99 | root | localhost:58476 | test | Query | 384 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 100 | root | localhost:58478 | test | Query | 327 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |

秒后：

 | 91 | root | localhost:58460 | test | Query | 67 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 92 | root | localhost:58462 | test | Query | 148 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | 97 | root | localhost:58472 | test | Query | 335 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | | | 100 | root | localhost:58478 | test | Query | 395 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | Seconds after that: (all dead) | 58 | root | localhost | NULL | Query | 0 | starting | show processlist | | 93 | root | localhost:58464 | test | Sleep | 167 | | NULL | | 94 | root | localhost:58466 | test | Sleep | 238 | | NULL | | 98 | root | localhost:58474 | test | Sleep | 74 | | NULL | | 101 | root | localhost:58498 | test | Sleep | 52 | | NULL |

可能需要创建一个脚本来监视进程列表，并且可能是一个更深层的结果集，其中包含运行的精确查询，以确定哪些查询事件正在查杀该应用程序

代码使用GenericDao，它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例，该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。

这可能是您问题的根源，您不应该使用JpaTemplate来获取EntityManager这将为您提供un unmanaged Entitymanager 。实际上你根本不应该使用JpaTemplate 。

建议基于普通的EntityManager API编写daos，并像往常一样注入EntityManager （使用@PersistenceContext ）。

如果你真的想使用JpaTemplate使用execute方法并传入一个JpaCallback ，它将为你提供一个托管的EntityManager 。

还要确保正确设置事务处理没有正确的tx设置连接将不会关闭，因为spring不知道它应该关闭连接。

在Spring / JPA / Mysql / Tomcat应用程序中分析Connection Closed Exception

Spring：在启动时初始化多个连接池

如何使用JDBC和连接池实现DAO管理器？

在pg_stat_activity中具有“空闲”状态的持久“COMMIT”查询

独立Java程序中的Initialcontext

如何在java中进行连接池？

BoneCP正确用法

Spring应用程序在8小时后失去与MySql的连接。如何正确配置？

来自DUAL的SELECT 1：MySQL

帮助我避免与JPA，Hibernate和MySQL的连接超时

LISTEN / NOTIFY pgconnection关闭java？

在Spring / JPA / Mysql / Tomcat应用程序中分析Connection Closed Exception

Spring：在启动时初始化多个连接池

如何使用JDBC和连接池实现DAO管理器？

在pg_stat_activity中具有“空闲”状态的持久“COMMIT”查询

独立Java程序中的Initialcontext

如何在java中进行连接池？

BoneCP正确用法

Spring应用程序在8小时后失去与MySql的连接。 如何正确配置？

来自DUAL的SELECT 1：MySQL

帮助我避免与JPA，Hibernate和MySQL的连接超时

LISTEN / NOTIFY pgconnection关闭java？

Spring应用程序在8小时后失去与MySql的连接。如何正确配置？