Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default suggestion check error in iscontained #161

Closed
poudelankit opened this issue Oct 5, 2023 · 2 comments
Closed

default suggestion check error in iscontained #161

poudelankit opened this issue Oct 5, 2023 · 2 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@poudelankit
Copy link

code for ConstraintSuggestion Default
suggestions = ConstraintSuggestionRunner(spark).onData(df).addConstraintRule(DEFAULT()).run()
for suggestion in suggestions['constraint_suggestions']:
print(suggestion['code_for_constraint'])

Result of default suggestion
The default constraint suggestion will return the following as a value under 'code_for_constraint':
.isContainedIn("Category #1", ["Dental Surgery", "Laboratory"], lambda x: x >= 0.98, "It should be above 0.98!")
which when applied to addCheck would result in error.

Check Implementation:
check = Check(spark,CheckLevel.Error,"Manual Check")
verification_runner = VerificationSuite(spark).onData(df).addCheck(check.isContainedIn("Category #1", ["Dental Surgery", "Laboratory"], lambda x: x >= 0.98, "It should be above 0.98!"))
verification_result = verification_runner.run()
df_checked = VerificationResult.checkResultsAsDataFrame(spark,verification_result)
df_checked.show(truncate=False)

error
TypeError: isContainedIn() takes 3 positional arguments but 5 were given

image

@chenliu0831
Copy link
Contributor

This should fix the error #157?

@chenliu0831 chenliu0831 added bug Something isn't working duplicate This issue or pull request already exists labels Oct 5, 2023
@poudelankit
Copy link
Author

Implementation of scala in python worked for me::

Change was made on checks.py

def isContainedIn(self, column,allowed_values, assertion=None, hint=None):
"""
* Asserts that every non-null value in a column is contained in a set of predefined values
*
* @param column Column to run the assertion on
* @param allowedValues Allowed values for the column
* @param assertion Function that receives a double input parameter and returns a boolean
* @param hint A hint to provide additional context why a constraint could have failed
* @return
*/
"""
if(assertion):
assertion_func = (
ScalaFunction1(self._spark_session.sparkContext._gateway, assertion)
)
hint = self._jvm.scala.Option.apply(hint)
arr = self._spark_session.sparkContext._gateway.new_array(self._jvm.java.lang.String, len(allowed_values))
for i in range(len(allowed_values)):
arr[i] = allowed_values[i]
if assertion:
self._Check = self._Check.isContainedIn(column, arr,assertion_func, hint)
else:
self._Check = self._Check.isContainedIn(column, arr)
return self

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants