Favor Disjunction over Validation

On July 20, 2014, in Scala, scalaz, by OleTraveler

What is Wrong

For validating data in Scala, Scalaz is the de facto tool for the job. The applicative behavior allows the programmer to accumulate errors from various data flows. However, I believe all the video and blog learning resources I have come across are fundamentally wrong in how they explain using Validation classes. Because of these resources, I have been using validation incorrectly for many years. Recently, I have come to the conclusion that the Validation class should only be used during the act of accumulating, that is for the |@| operator, and Disjunction ( \/ ) should be the main class used for validation and error checking.

An Example

Take for instance validating user input for a CreditCard described in the image below. In this example, we expect the string inputs of cardholder, number, expiration month and expiration year. The process should validate that the cardholder’s input only contain ASCII characters and that it has no more than 30 characters. The process should return both errors if both errors are found. For the number input, the process should first validate that there are only digits and then validate that the number passes the Luhn Algorithm. If the number input is not all digits, then we short circuit out of the Luhn Check. The process should validate individually that the expiration month and expiration year are Integers, converting them if they are, and that they are in a valid range. It should then validate against the combined value of month and year to ensure the expiration date is the current month or in the future. Finally, if no validation errors are found, the process will create and return a new instance of CreditCard, otherwise the process will return a list of all the errors found.

cc-validation

Functions Return Validation, an Incorrect Approach

An incorrect approach, one I have been using for years, is to code all validation functions to return a Validation object where the failure type is a NonEmptyList[ErrorMessage]. Having NonEmptyList[ErrorMessage] as the failure type allows the calling code to easily accumulate errors. In the example below, the validate method is taking 4 input strings and returning a CreditCard object if all the validation checks pass, otherwise it returns one or more ErrorMessages indicating a failure.

   type ErrorMessage = String
 
  def asciiOnly(str: String) : Validation[NonEmptyList[ErrorMessage],String] =
    if ("[\\x00-\\x7F]+".r.pattern.matcher(str).matches) str.success
    else "only ascii characters are allowed".failNel
 
  def maxStrLength(max: Int): (String) => Validation[NonEmptyList[ErrorMessage],String] = str =>
    if (str.length <= max) str.success
    else s"can not exceed ${max} characters".failNel
 
  def digitsOnly(str: String) : Validation[NonEmptyList[ErrorMessage], String] =
    if ("""^\d*$""".r.pattern.matcher(str).matches) str.success
    else s"only numbers are allowed".failNel
 
  def toInt(str: String) : Validation[NonEmptyList[ErrorMessage], Int] = try {
    str.toInt.success
  } catch {
    case e: NumberFormatException => "must be a number".failNel
  }
 
  def modTen(str: String) : Validation[NonEmptyList[ErrorMessage], String] = {
    def passesModTen(str: String): Boolean = ???
 
    if (passesModTen(str)) str.success
    else s"invalid number".failNel
  }
 
  def validMonth(m: Int) : Validation[NonEmptyList[ErrorMessage], Int] =
    if (m >= 1 && m <= 12) m.success
    else "invalid month".failNel
 
  def positiveNu(m: Int) : Validation[NonEmptyList[ErrorMessage], Int] =
    if (m > 0) m.success
    else "must be positive".failNel
 
  def validExpiration(currentMonth: Int, currentYear:Int) : (Int,Int) => Validation[NonEmptyList[ErrorMessage], (Int,Int)] = (month,year) =>
    if (year > currentYear || (year === currentYear && month >= currentMonth)) (month, year).success
    else "card has expired".failNel
 
  case class CreditCard(cardholder: String, number: String, expMonth: Int, expYear:Int)
 
  def validateCardHolder(cardholder: String): Validation[NonEmptyList[ErrorMessage], String] =
    (asciiOnly(cardholder) |@| maxStrLength(10)(cardholder)){ (s ,_)  => s }
 
  def validate(cardholder: String, number: String, expMonth: String, expYear: String) : Validation[NonEmptyList[ErrorMessage], CreditCard] = {
 
    /** Accumulate both */
    val cardHolderV = validateCardHolder(cardholder)
 
    /** Check digits, then modTen */
    val numberV = digitsOnly(number).flatMap(modTen(_))
 
    val validToday = validExpiration(7,2014)
 
    val monthYear = for {
      m <- toInt(expMonth).flatMap(validMonth(_))
      y <- toInt(expYear).flatMap(positiveNu(_))
      my <- validToday(m,y)
    } yield my
 
    /** If there were any errors, return them.  Otherwise create a credit card */
    (cardHolderV |@| numberV |@| monthYear) { (c,n,my) => CreditCard(c,n,my._1,my._2)}
 
  }
 

The most glaring coding error is the lack of separation of concern in the methods that do the low level validation. The validation methods have taken it upon themselves to return a NonEmptyList[ErrorMessage] in the failure position which does not clearly separate what these methods should be doing — which is returning the correct String if valid or one and only one ErrorMessage objects if invalid. The methods also should not be returning a Validation object since the object in the Failure position should not be required to have a Semigroup typeclass; it is just an ErrorMassage. Note: Although the Validation class does not explicitly require the type contained in the failure type has a Semigroup typeclass, the most interesting method on Validation `|@|` does require that the type contained in the failure type has a Semigroup typeclass. In other words, the only useful form of Validation is one where the Failure case type has a Semigroup typeclass.

Different Approach; Default to Disjunction

Scalaz provides the disjunction type \/[+A, +B] which is isomorphic to scala.Either[A,B] but unlike Either is right biased and integrates better with other Scalaz classes we will be using such as Validation and Klesli. Instead of Validation[NonEmptylist,T], the return type for the validating methods should be \/[ErrorMessage,T] for any validating method that will only return a single ErrorMessage or a \/[NonEmptyList[ErrorMessage],T] for any validating type that may return 1 or more ErrorMessage instances.

  type ErrorMessage = String
 
  def asciiOnly(str: String): ErrorMessage \/ String =
    if ("[\\x00-\\x7F]+".r.pattern.matcher(str).matches) \/-(str)
    else -\/("only ascii characters are allowed")
 
  def maxStrLength(max: Int): (String) => ErrorMessage \/ String = str =>
    if (str.length <= max) \/-(str)
    else -\/(s"can not exceed ${max} characters")
 
  def digitsOnly(str: String): ErrorMessage \/ String =
    if ( """^\d*$""".r.pattern.matcher(str).matches) \/-(str)
    else -\/(s"only numbers are allowed")
 
  def toInt(str: String): ErrorMessage \/ Int = try {
    \/-(str.toInt)
  } catch {
    case e: NumberFormatException => -\/("must be a number")
  }
 
  def modTen(str: String): ErrorMessage \/ String = {
    def passesModTen(str: String): Boolean = ???
 
    if (passesModTen(str)) \/-(str)
    else -\/(s"invalid number")
  }
 
  def validMonth(m: Int): ErrorMessage \/ Int =
    if (m >= 1 && m <= 12) \/-(m)
    else -\/("invalid month")
 
  def positiveNumber(m: Int): ErrorMessage \/ Int =
    if (m > 0) \/-(m)
    else -\/("must be positive")
 
  def validExpiration(currentMonth: Int, currentYear: Int): (Int, Int) => ErrorMessage \/ (Int, Int) = (month, year) =>
    if (year > currentYear || (year === currentYear && month >= currentMonth)) \/-((month, year))
    else -\/("card has expired")
 
  case class CreditCard(cardholder: String, number: String, expMonth: Int, expYear: Int)
 
  def validateCardHolder(cardholder: String): NonEmptyList[ErrorMessage] \/ String =
    (asciiOnly(cardholder).validation.toValidationNel |@| maxStrLength(10)(cardholder).validation.toValidationNel) { (s, _) => s}
      .disjunction
 
  def validate(cardholder: String, number: String, expMonth: String, expYear: String): NonEmptyList[ErrorMessage] \/ CreditCard = {
 
    /** Accumulate both */
    val cardHolderV = validateCardHolder(cardholder)
 
    /** Check digits, then modTen */
    val numberV = digitsOnly(number).flatMap(modTen(_))
 
    val validToday = validExpiration(7, 2014)
 
    val monthYear = for {
      my <- (toInt(expMonth).flatMap(validMonth(_)).validation.toValidationNel |@|
        toInt(expYear).flatMap(positiveNumber(_)).validation.toValidationNel) {
        (_, _)
      }
        .disjunction
      validMY <- validToday(my._1, my._2).leftMap(NonEmptyList(_)) //error return type of my is NonEmptyList[ErrorMessage]
    } yield validMY
 
    /** If there were any errors, return them.  Otherwise create a credit card */
    (cardHolderV.validation |@|
      numberV.validation.toValidationNel |@|
      monthYear.validation) { (c, n, my) => CreditCard(c, n, my._1, my._2)}
      .disjunction
 
  }

The resulting code, compared to the first example, is simpler and more concise at the low level validating methods. Code that calls these validating methods are no longer required to use NonEmptyList as the Semigroup implementation.

Notice how in the methods validateCardHolder and validate, we convert to Validation[NonEmptyList[ErrorMessage],T] only when we expect to accumulate errors and then immediately convert back to \/[NonEmptyList[ErrorMessage],T] when we are done accumulating in that current scope.

Disjunction can have lawful Monad instances. This allows Disjunction to play nicely with scalaz.Kleisli which will allow us to chain together types of (T) => ErrorMessage \/ U such that the output of function 1 can map to the input of function 2. I am hoping to produce another post about this subject in the future.

\/.flatMap is a valid method unlike Validation.flatMap which has been deprecated in Scalaz 7.1 for various reasons.

The conversion from \/[ErrorMessage,String] to Validation[NonEmptyList[ErrorMessage], String] using method validation.toValidationNel does suffer from being a bit verbose and maybe a bit too specific to the NonEmptyList implementation. This can be handled at the Scalaz library level or in the short term, we can add an implicit class to reduce the verbosity of the Disjunction conversions. That implicit class may look something like this.

object Disjunction {
 
  implicit class DisjunctionI[A, B](d: \/[A, B]) {

    /** I would prefer a method that required F[_] to have a semigroup typeclass.
     *  That is over my head at the moment. 
     */ 
    def validationF[F[_]](f: A => F[A]): Validation[F[A],B] =
      d match {
        case -\/(a) => Failure(f(a))
        case \/-(b) => Success(b)
      }
 
    def validationNel: Validation[NonEmptyList[A], B] = validationF(NonEmptyList(_))
 
    def leftNel: \/[NonEmptyList[A],B] = d.leftMap(NonEmptyList(_))
 
  }
}

One must also be careful not to use a (T) => \/[E,U] in \/[NonEmptyList[E],T].flatMap. Convert the \/[E,U] to a \/[NonEmptyList[]] as show in the validate method: validToday(my._1, my._2).leftMap(NonEmptyList(_)). This verbosity can also be reduced with an implicit class.

Code for this blog can be found in a Gist. For more examples favoring disjunction, consult my port of Scalaz Validation Contrib.

Tagged with:  

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>