Skip to content

Commit a6b340e

Browse files
authored
chore: replace legacy datetime rebase tests with current scan coverage [iceberg] (#3605)
1 parent 254a81c commit a6b340e

3 files changed

Lines changed: 33 additions & 168 deletions

File tree

docs/source/contributor-guide/parquet_scans.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,10 @@ The following features are not supported by either scan implementation, and Come
4949

5050
The following shared limitation may produce incorrect results without falling back to Spark:
5151

52-
- No support for datetime rebasing detection or the `spark.comet.exceptionOnDatetimeRebase` configuration. When
53-
reading Parquet files containing dates or timestamps written before Spark 3.0 (which used a hybrid
54-
Julian/Gregorian calendar), dates/timestamps will be read as if they were written using the Proleptic Gregorian
55-
calendar. This may produce incorrect results for dates before October 15, 1582.
52+
- No support for datetime rebasing. When reading Parquet files containing dates or timestamps written before
53+
Spark 3.0 (which used a hybrid Julian/Gregorian calendar), dates/timestamps will be read as if they were
54+
written using the Proleptic Gregorian calendar. This may produce incorrect results for dates before
55+
October 15, 1582.
5656

5757
The `native_datafusion` scan has some additional limitations, mostly related to Parquet metadata. All of these
5858
cause Comet to fall back to Spark.

spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1815,6 +1815,35 @@ class ParquetReadV1Suite extends ParquetReadSuite with AdaptiveSparkPlanHelper {
18151815
})
18161816
}
18171817

1818+
test("reading ancient dates before 1582") {
1819+
// Verify that legacy dates (before 1582-10-15) are read without error.
1820+
// Comet does not support datetime rebasing, so these dates are read as if they were
1821+
// written using the Proleptic Gregorian calendar (no rebase, no exception).
1822+
val file =
1823+
getResourceParquetFilePath("test-data/before_1582_date_v3_2_0.snappy.parquet")
1824+
1825+
Seq(CometConf.SCAN_NATIVE_ICEBERG_COMPAT, CometConf.SCAN_NATIVE_DATAFUSION).foreach {
1826+
scanImpl =>
1827+
withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> scanImpl) {
1828+
val df = spark.read.parquet(file)
1829+
1830+
// Verify Comet scan is in the plan
1831+
val plan = df.queryExecution.executedPlan
1832+
checkCometOperators(plan)
1833+
1834+
// Verify all 8 rows are read and contain dates before 1582
1835+
val rows = df.collect()
1836+
assert(rows.length == 8, s"Expected 8 rows with $scanImpl, got ${rows.length}")
1837+
rows.foreach { row =>
1838+
val date = row.getDate(0)
1839+
assert(
1840+
date.toLocalDate.getYear < 1582,
1841+
s"Expected date before 1582 with $scanImpl, got $date")
1842+
}
1843+
}
1844+
}
1845+
}
1846+
18181847
}
18191848

18201849
// ignored: native_comet scan is no longer supported

spark/src/test/scala/org/apache/spark/sql/comet/ParquetDatetimeRebaseSuite.scala

Lines changed: 0 additions & 164 deletions
This file was deleted.

0 commit comments

Comments
 (0)