-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Member enrichment llm lfx 1712 #2688
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces substantial changes to the member enrichment process, including the creation of a new database table Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Worker
participant LLMService
participant Database
User->>Worker: Start Enrichment Process
Worker->>Database: Fetch Enrichable Members
Database-->>Worker: Return Members
Worker->>LLMService: Process Member Sources
LLMService-->>Worker: Return Processed Data
Worker->>Database: Update Member Data
Database-->>Worker: Confirmation
Worker->>User: Enrichment Complete
Possibly related PRs
Suggested labels
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
🧹 Outside diff range and nitpick comments (27)
services/libs/types/src/enums/llm.ts (1)
1-4
: Add documentation for model capabilities and versioning strategy.The enum implementation is correct, but consider adding JSDoc comments to document:
- Capabilities and differences between the models
- Versioning strategy for handling model updates
- Any specific requirements or limitations of each model
Example documentation:
/** * Supported LLM model types for AWS Bedrock Runtime. * @see https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html */ export enum LlmModelType { /** * Claude 3 Sonnet - Balanced performance and cost. * Best for: [specific use cases] * Limitations: [any known limitations] */ CLAUDE_3_5_SONNET = 'anthropic.claude-3-5-sonnet-20240620-v1:0', /** * Claude 3 Opus - Highest capability model. * Best for: [specific use cases] * Limitations: [any known limitations] */ CLAUDE_3_OPUS = 'anthropic.claude-3-opus-20240229-v1:0', }services/apps/premium/members_enrichment_worker/src/main.ts (1)
Line range hint
7-16
: Add error handling and graceful shutdown.While the initialization flow is correct, consider these essential improvements:
- Add try-catch blocks for error handling
- Implement graceful shutdown
- Add validation for the CROWD_EDITION environment variable
Consider this implementation:
setImmediate(async () => { + try { await svc.init() await scheduleMembersEnrichment() - if (process.env['CROWD_EDITION'] === Edition.LFX) { + const edition = process.env['CROWD_EDITION'] + if (edition && edition === Edition.LFX) { await scheduleMembersLFIDEnrichment() } await svc.start() + } catch (error) { + console.error('Failed to initialize service:', error) + process.exit(1) + } }) + +process.on('SIGTERM', async () => { + try { + await svc.stop() + process.exit(0) + } catch (error) { + console.error('Error during shutdown:', error) + process.exit(1) + } +})services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts (1)
Line range hint
11-14
: Enhance error handling with more contextThe current error handling simply wraps and rethrows the error. Consider adding more context about the operation being performed.
try { const db = svc.postgres.reader rows = await fetchMembersForLFIDEnrichment(db, limit, afterId) } catch (err) { - throw new Error(err) + throw new Error(`Failed to fetch enrichable members for LFID: ${err.message}`) }services/libs/types/src/index.ts (1)
64-64
: Add newline at end of file.Consider adding a newline at the end of the file to follow best practices.
export * from './llm' +
services/apps/premium/members_enrichment_worker/src/activities/syncEnrichedData.ts (2)
Line range hint
21-31
: Improve error handling and function signatureThe current implementation has several issues that should be addressed:
- Error handling loses the original stack trace
- Function returns
null
despitevoid
return type- Documentation could be more descriptive
Consider this improved implementation:
/* - syncMembersToOpensearch is a Temporal activity that sync a newly enriched member - in database to OpenSearch. + * Temporal activity that synchronizes an enriched member from the database to OpenSearch. + * @param input - The member ID to synchronize + * @throws {Error} If synchronization fails */ export async function syncMembersToOpensearch(input: string): Promise<void> { try { syncMembers.syncMembers(input) } catch (err) { - throw new Error(err) + throw err instanceof Error ? err : new Error(String(err)) } - - return null }
Line range hint
33-43
: Apply consistent improvements to organization sync functionThe organization sync function has the same issues as the member sync function.
Apply similar improvements:
/* - syncOrganizationsToOpensearch is a Temporal activity that sync newly enriched - organizations in database to OpenSearch. + * Temporal activity that synchronizes enriched organizations from the database to OpenSearch. + * @param input - Array of organization IDs to synchronize + * @throws {Error} If synchronization fails */ export async function syncOrganizationsToOpensearch(input: string[]): Promise<void> { try { syncOrganizations.syncOrganizations(input) } catch (err) { - throw new Error(err) + throw err instanceof Error ? err : new Error(String(err)) } - - return null }services/libs/types/src/premium/enrichment.ts (2)
34-41
: Consider using a more specific type for date fieldsThe interface correctly defines organization data structure, but date fields could benefit from more specific typing to ensure proper date handling.
Consider this improvement:
export interface IMemberOrganizationData { orgId: string orgName: string jobTitle: string - dateStart: string - dateEnd: string + dateStart: string | null // ISO 8601 date string + dateEnd: string | null // ISO 8601 date string source: string }
43-54
: LGTM! Well-structured interface with clear documentationThe interface effectively aggregates member data with clear section comments and appropriate type usage. The
attributes
field usingRecord<string, Record<string, unknown>>
provides good flexibility while maintaining type safety.Consider adding a type for the attribute values to improve type safety:
type AttributeValue = string | number | boolean | null; type Attributes = Record<string, Record<string, AttributeValue>>;services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/authenticateLFAuth0.ts (1)
Line range hint
29-33
: Add environment variable validation.The code relies on several critical environment variables for Auth0 configuration. Consider adding validation at startup to fail fast if any required variables are missing.
+const REQUIRED_ENV_VARS = [ + 'CROWD_LFX_AUTH0_DOMAIN', + 'CROWD_LFX_AUTH0_CLIENT_ID', + 'CROWD_LFX_AUTH0_CLIENT_ASSERTION_SIGNING_KEY', + 'CROWD_LFX_AUTH0_AUDIENCE', +]; + +function validateEnvVars() { + const missing = REQUIRED_ENV_VARS.filter(key => !process.env[key]); + if (missing.length > 0) { + throw new Error(`Missing required environment variables: ${missing.join(', ')}`); + } +} + async function getTokenFromAuth0(): Promise<ITokenWithExpiration> { + validateEnvVars(); + const auth = new AuthenticationClient({ domain: process.env['CROWD_LFX_AUTH0_DOMAIN'], clientId: process.env['CROWD_LFX_AUTH0_CLIENT_ID'], clientAssertionSigningKey: process.env['CROWD_LFX_AUTH0_CLIENT_ASSERTION_SIGNING_KEY'], })services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts (2)
Line range hint
17-19
: Improve error handling to preserve stack traces.The current error handling pattern loses the original error stack trace, which could make debugging more difficult. Consider using the
Error.cause
property for proper error chaining.Apply this pattern in both error handlers:
} catch (err) { - throw new Error(err) + throw new Error('Failed to get identities', { cause: err }) }} catch (err) { - throw new Error(err) + throw new Error('Failed to update member with enrichment data', { cause: err }) }Also applies to: 31-33
Line range hint
54-72
: Consider adding timeout and error handling for the fetch call.The
mergeMembers
function makes an HTTP request without:
- Request timeout
- Proper error handling for network failures
- Response body parsing for error details
Consider enhancing the implementation:
export async function mergeMembers( primaryMemberId: string, secondaryMemberId: string, tenantId: string, ): Promise<void> { - const res = await fetch( - `${process.env['CROWD_API_SERVICE_URL']}/tenant/${tenantId}/member/${primaryMemberId}/merge`, - { - method: 'PUT', - headers: { - Authorization: `Bearer ${process.env['CROWD_API_SERVICE_USER_TOKEN']}`, - 'Content-Type': 'application/json', - }, - body: JSON.stringify({ - memberToMerge: secondaryMemberId, - }), - }, - ) + try { + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), 30000); + + const res = await fetch( + `${process.env['CROWD_API_SERVICE_URL']}/tenant/${tenantId}/member/${primaryMemberId}/merge`, + { + method: 'PUT', + headers: { + Authorization: `Bearer ${process.env['CROWD_API_SERVICE_USER_TOKEN']}`, + 'Content-Type': 'application/json', + }, + body: JSON.stringify({ + memberToMerge: secondaryMemberId, + }), + signal: controller.signal, + }, + ); + + clearTimeout(timeoutId); - if (res.status !== 200) { - throw new Error(`Failed to merge member ${primaryMemberId} with ${secondaryMemberId}!`) + if (!res.ok) { + const errorBody = await res.text(); + throw new Error( + `Failed to merge member ${primaryMemberId} with ${secondaryMemberId}. Status: ${res.status}. Details: ${errorBody}` + ); + } + } catch (err) { + if (err.name === 'AbortError') { + throw new Error('Request timeout while merging members', { cause: err }); + } + throw new Error('Failed to merge members', { cause: err }); } }services/libs/types/src/llm.ts (2)
3-19
: LGTM! Consider adding JSDoc comments.The base interfaces are well-structured with a clear hierarchy. The use of generics in
ILlmResult<T>
provides good type safety.Consider adding JSDoc comments to document:
- The purpose of each interface
- The meaning of the
entityId
andmetadata
fields inILlmHistoryEntry
- The units for
inputTokenCount
inILlmResponse
+/** + * Represents an entry in the LLM prompt history, extending the base response with tracking information. + */ export interface ILlmHistoryEntry extends ILlmResponse { type: LlmQueryType + /** ID of the entity (e.g., member, organization) this prompt was about */ entityId?: string + /** Additional contextual information about this prompt */ metadata?: Record<string, unknown> }
35-45
: Consider implementing a pricing update mechanism.The comment "to estimate costs - these numbers can change" suggests volatile pricing. Having this hardcoded in the codebase will require code changes for price updates.
Consider:
- Moving pricing to a configuration service or database
- Implementing a price update mechanism
- Adding price monitoring and alerting for cost control
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts (3)
Line range hint
12-39
: Enhance error handling in database operationsThe current error handling in both database functions simply wraps and rethrows errors, losing context and potentially exposing sensitive information.
Consider implementing more robust error handling:
export async function getGithubIdentitiesWithoutSourceId( limit: number, afterId: string, afterUsername: string, ): Promise<IMemberIdentity[]> { let rows: IMemberIdentity[] = [] try { const db = svc.postgres.reader rows = await getGithubIdentities(db, limit, afterId, afterUsername) } catch (err) { - throw new Error(err) + throw new Error(`Failed to fetch GitHub identities: ${err.message}`) } return rows } export async function updateIdentitySourceId( identity: IMemberIdentity, sourceId: string, ): Promise<void> { try { const db = svc.postgres.writer await updateSourceId(db, identity, sourceId) } catch (err) { - throw new Error(err) + throw new Error( + `Failed to update source ID for identity ${identity.id}: ${err.message}` + ) } }
Line range hint
44-71
: Improve environment variable handling and token securityThe function directly accesses environment variables without validation and splits the token string without type checking.
Consider these security improvements:
export async function checkTokens(): Promise<boolean> { + const tokens = process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'] + if (!tokens) { + throw new Error('GitHub tokens not configured') + } + const cache = new RedisCache(`lfx-auth0`, svc.redis, svc.log) const tokenRotator = new GithubTokenRotator( cache, - process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'].split(','), + tokens.split(',').filter(token => token.trim().length > 0), )Also, consider:
- Moving token configuration to a secure configuration service
- Implementing token validation before usage
- Adding token rotation logs for audit trails
Line range hint
73-108
: Improve GitHub API interaction robustnessThe function needs several improvements for better reliability and maintainability.
Consider these enhancements:
export async function findGithubSourceId(username: string): Promise<string | null> { + const RATE_LIMIT_STATUS = 429; + const REQUEST_TIMEOUT = 5000; + try { const cache = new RedisCache(`lfx-auth0`, svc.redis, svc.log) const tokenRotator = new GithubTokenRotator( cache, process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'].split(','), ) const token = await tokenRotator.getToken() + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), REQUEST_TIMEOUT); const response = await fetch(`https://api.github.com/users/${username}`, { headers: { Authorization: `Bearer ${token}`, }, + signal: controller.signal, }) + clearTimeout(timeoutId); + if (!response.ok) { - console.error('HTTP error', response.status) - if (response.status === 429) { + svc.log.error('GitHub API error', { status: response.status, username }) + if (response.status === RATE_LIMIT_STATUS) { throw new Error('Rate limit exceeded') } return nullAdditional suggestions:
- Implement retry mechanism for transient failures
- Add request/response logging for debugging
- Consider using an API client library for better error handling
services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts (2)
8-14
: Consider using configuration file for environment setupWhile the current setup works, hardcoding environment variables and service configurations directly in the code makes it less maintainable and harder to modify for different environments.
Consider moving these configurations to a separate config file:
-process.env['CROWD_TEMPORAL_TASKQUEUE'] = 'not-important' -svc.config.envvars = [] -svc.config.producer = { enabled: false } -svc.config.redis = { enabled: true } -svc.config.temporal = { enabled: false } -svc.config.questdb = { enabled: false } -svc.options.opensearch = { enabled: false } +import { onboardingConfig } from '../config/onboarding' + +svc.config = onboardingConfig
62-62
: Remove commented-out debugging codeRemove the commented-out test data as it's not needed in production code.
- // let members = ['8db6c61e-f8b0-400e-ac5d-cb550d8740c9']
services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (1)
Line range hint
33-105
: Well-structured implementation with robust error handlingThe changes effectively implement the new member enrichment architecture while maintaining good separation of concerns and error handling. The workflow handles multiple sources, caching, and data processing in a clean and maintainable way.
A few suggestions for future improvements:
- Consider adding logging for better observability of the enrichment process
- Add comments explaining the new processing flow for better maintainability
services/archetypes/worker/src/index.ts (2)
233-235
: Use path.join for cross-platform compatibility.Using path.resolve with string concatenation can be fragile across different operating systems. Consider using path.join for more robust path handling.
const workflowBundle = await bundleWorkflowCode({ - workflowsPath: path.resolve('./src/workflows'), + workflowsPath: path.join(__dirname, 'workflows'), })
245-245
: Consider a more type-safe approach for loading activities.Using require with path.resolve is not type-safe and could lead to runtime errors. Consider importing activities directly or using a more structured approach.
- activities: require(path.resolve('./src/activities')), + activities: await import(path.join(__dirname, 'activities')),services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)
282-289
: Consider adding input sanitization for email arrays.While the implementation handles both formats well, consider adding sanitization for the array case:
let emails: string[] if (Array.isArray(data.email)) { - emails = data.email + emails = data.email.filter(isEmail) } else { emails = data.email.split(',').filter(isEmail) }This ensures consistent email validation regardless of the input format.
services/libs/common_services/src/services/llm.service.ts (2)
152-153
: Remove unnecessary ESLint disable commentThe ESLint disable comment
@typescript-eslint/no-explicit-any
appears to be unnecessary here, as there is no usage ofany
in the function parameters. Remove the comment to keep the code clean.Apply this diff to remove the unnecessary comment:
prompt: string, - // eslint-disable-next-line @typescript-eslint/no-explicit-any ): Promise<ILlmResult<LlmMemberEnrichmentResult>> {
97-99
: Define appropriate types instead of usingany
Casting
settings.arguments
toany
and disabling the ESLint rule@typescript-eslint/no-explicit-any
is not recommended. Define appropriate types forsettings.arguments
to maintain type safety.Consider updating the code as follows:
const command = new InvokeModelCommand({ body: JSON.stringify({ messages: [ { role: 'user', content: [ { type: 'text', text: prompt, }, ], }, ], - // eslint-disable-next-line @typescript-eslint/no-explicit-any - ...(settings.arguments as any), + ...settings.arguments, }), modelId: settings.modelId, accept: 'application/json', contentType: 'application/json', })Then, ensure that
settings.arguments
is properly typed inILlmSettings
. For example:export interface ILlmSettings { modelId: string arguments?: Record<string, unknown> // Define a more specific type if possible }services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (3)
155-155
: Use optional chaining for cleaner conditional checksYou can simplify the condition in line 155 by using optional chaining. Instead of
if (cache && cache.data)
, useif (cache?.data)
for cleaner and more concise code.Apply this diff:
- if (cache && cache.data) { + if (cache?.data) {🧰 Tools
🪛 Biome
[error] 155-155: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
195-195
: Address the TODO: Implement data squasher using LLMThere's a TODO comment indicating that the data squasher using LLM needs to be implemented. Ensuring this functionality is crucial for member data consolidation.
Would you like assistance in implementing the data squashing logic using LLM? I can help by providing a sample implementation or opening a GitHub issue to track this task.
331-331
: Address the TODO: Update member data with LLM resultsAt line 331, there's a TODO to update the member data based on the LLM's output. Implementing this will ensure that high-confidence changes are correctly applied to the member profile.
Do you need assistance in writing the code to update the member data? I can help draft the implementation or open a GitHub issue for further tracking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (31)
backend/src/database/migrations/V1731052735__llm-prompt-history.sql
(1 hunks)backend/src/database/migrations/V1731484783__drop-member-deprecated-columns.sql
(1 hunks)services/apps/premium/members_enrichment_worker/package.json
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
(4 hunks)services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/authenticateLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getEnrichmentLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/syncEnrichedData.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/main.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/schedules/getMembersToEnrich.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
(3 hunks)services/archetypes/worker/src/index.ts
(2 hunks)services/libs/common_services/package.json
(1 hunks)services/libs/common_services/src/repos/llmPromptHistory.repo.ts
(1 hunks)services/libs/common_services/src/services/index.ts
(1 hunks)services/libs/common_services/src/services/llm.service.ts
(1 hunks)services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts
(4 hunks)services/libs/types/src/enums/activities.ts
(0 hunks)services/libs/types/src/enums/index.ts
(1 hunks)services/libs/types/src/enums/llm.ts
(1 hunks)services/libs/types/src/index.ts
(1 hunks)services/libs/types/src/llm.ts
(1 hunks)services/libs/types/src/premium/enrichment.ts
(1 hunks)
💤 Files with no reviewable changes (1)
- services/libs/types/src/enums/activities.ts
✅ Files skipped from review due to trivial changes (4)
- backend/src/database/migrations/V1731484783__drop-member-deprecated-columns.sql
- services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getEnrichmentLFAuth0.ts
- services/apps/premium/members_enrichment_worker/src/service.ts
- services/apps/premium/members_enrichment_worker/src/types.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
[error] 155-155: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 167-167: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 171-171: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 180-180: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 184-184: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
services/libs/common_services/src/repos/llmPromptHistory.repo.ts
[error] 6-8: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
🔇 Additional comments (34)
services/libs/common_services/src/services/index.ts (2)
2-2
: LGTM! Export follows consistent pattern
The new export follows the same pattern as other exports in the file, maintaining consistency.
2-2
: Verify architectural placement of LLM service
Since this is being added to common_services, let's verify there are no circular dependencies and that this is the appropriate location for LLM functionality.
✅ Verification successful
Let me gather more information about the dependencies and imports to verify the architectural placement.
Based on the gathered information, I can now provide the final response:
LLM service placement in common_services is architecturally sound
The placement is appropriate because:
- The service is a foundational utility that provides LLM capabilities through AWS Bedrock
- It has no circular dependencies, only depending on core utilities and types
- It's correctly consumed by higher-level services like merge_suggestions_worker and members_enrichment_worker
- It shares the same level of abstraction as other common services (priority.service)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential circular dependencies and verify service placement
# Check for imports of llm.service across the codebase
echo "Checking usage of llm.service:"
rg -l "import.*llm\.service"
# Check if LlmService has any imports from other services
echo -e "\nChecking LlmService dependencies:"
rg -A 5 "import.*from.*services" "./services/libs/common_services/src/services/llm.service.ts"
# Look for other LLM-related code to ensure proper centralization
echo -e "\nChecking for other LLM-related code:"
rg -l "llm|LLM|bedrock|Bedrock" --type ts
Length of output: 2003
Script:
#!/bin/bash
# Check the content of llm.service.ts to understand its dependencies
echo "Checking llm.service.ts content:"
cat "./services/libs/common_services/src/services/llm.service.ts"
# Check what other services are in common_services
echo -e "\nListing other services in common_services:"
ls -1 "./services/libs/common_services/src/services/"
# Check if any other services depend on llm.service
echo -e "\nChecking imports of llm.service:"
rg "from.*llm\.service" -A 2
Length of output: 4983
services/libs/types/src/enums/llm.ts (1)
6-8
: Verify if additional query types are planned.
The enum is correctly implemented. However, having a single member suggests that more query types might be added in the future.
Let's check if there are plans for additional query types:
✅ Verification successful
Based on the search results, I can see that LlmQueryType.MEMBER_ENRICHMENT
is currently the only query type being used in the codebase, specifically in services/libs/common_services/src/services/llm.service.ts
. There are no TODO comments or GitHub issues discussing additional query types.
No additional query types are currently planned.
The single member suggests this is a focused implementation for member enrichment functionality rather than a placeholder for future expansion. The enum is correctly implemented for its current purpose.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for TODO comments or discussions about additional query types
# Check for TODO comments mentioning LLM queries
rg -i "todo.*llm.*query"
# Check for references to LlmQueryType to understand current usage
ast-grep --pattern 'LlmQueryType.$_'
# Check GitHub issues for planned LLM query types
gh issue list --search "llm query type"
Length of output: 188736
services/apps/premium/members_enrichment_worker/src/main.ts (1)
1-5
: LGTM! Verify service initialization pattern.
The shift to a service-based pattern with centralized configuration is a good architectural improvement.
Let's verify the consistency of this pattern across the codebase:
✅ Verification successful
Let me check the service initialization pattern in more detail to verify its consistency.
Let me check the service initialization pattern in the specific service files.
Based on the verification results, I can now provide a final response:
Service initialization pattern is consistent and well-structured
The implementation follows the standard service worker pattern using @crowd/archetype-worker
:
- Properly configured with required environment variables
- Consistent use of
ServiceWorker
with typedConfig
andOptions
- Clean separation of configuration from business logic
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent service initialization pattern
# Expected: All worker services should follow similar pattern
# Check for similar service.ts files in other workers
fd -e ts -g 'service.ts' services/apps/
# Check for consistent import pattern
rg -g '*.ts' "import \{ svc \} from './service'" services/apps/
Length of output: 676
Script:
#!/bin/bash
# Check the implementation of service.ts files
ast-grep --pattern 'export const svc = $_'
# Check for any other service initialization patterns
rg -g '*.ts' -B 2 -A 2 'createService|initializeService|ServiceConfig' services/apps/premium/members_enrichment_worker/
Length of output: 50155
Script:
#!/bin/bash
# Check the service implementation in members_enrichment_worker
cat services/apps/premium/members_enrichment_worker/src/service.ts
# Check for service worker pattern in other files
rg -g '*.ts' -B 2 -A 2 'class ServiceWorker' services/apps/
Length of output: 910
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts (1)
4-4
: Verify service import path change across the codebase
The import path change from ../../main
to ../../service
appears to be part of a broader restructuring. Let's verify this change is consistent across the codebase.
✅ Verification successful
Let me verify the content of the main service file to ensure it's properly structured and contains the expected exports.
Let me check the content of main.ts and any references to it to ensure the migration is complete and correct.
Import path change is correct and consistent
The verification shows that:
- The
service.ts
file exists and exports thesvc
instance correctly - All imports of
svc
across the codebase are using the correct path fromservice
- The
main.ts
file is still present but serves as the application entry point, importingsvc
from./service
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining imports from 'main' that should be updated to 'service'
# and verify the new 'service' path exists and is properly structured
# Check for any remaining imports from 'main'
echo "Checking for remaining imports from 'main'..."
rg "from '\.\./\.\./main'" services/apps/premium/members_enrichment_worker/src/
# Verify the existence and structure of the new service path
echo "Verifying new service path..."
fd -t f "service.ts" services/apps/premium/members_enrichment_worker/src/
Length of output: 810
Script:
#!/bin/bash
# Check the content of the main service file
echo "Content of main service file:"
cat services/apps/premium/members_enrichment_worker/src/service.ts
# Check if there are any other files still importing from main.ts
echo -e "\nChecking for main.ts file:"
fd -t f "main.ts" services/apps/premium/members_enrichment_worker/src/
# Check for any imports referencing main.ts (with different path patterns)
echo -e "\nChecking for any main.ts imports:"
rg "from.*main'" services/apps/premium/members_enrichment_worker/src/
Length of output: 1275
Script:
#!/bin/bash
# Check the content of main.ts
echo "Content of main.ts:"
cat services/apps/premium/members_enrichment_worker/src/main.ts
# Look for any imports of 'svc' in the codebase
echo -e "\nChecking for svc imports:"
rg "import.*\{ ?svc ?}.*from" services/apps/premium/members_enrichment_worker/src/
Length of output: 1934
services/libs/types/src/enums/index.ts (1)
20-20
: LGTM! Verify the referenced module.
The new export follows the established pattern and maintains alphabetical ordering.
Let's verify the existence of the referenced module and its contents:
✅ Verification successful
Export from './llm' is valid and contains the expected enums
The verification confirms:
- The
llm.ts
module exists at the correct location - Contains both expected enums:
LlmModelType
with Claude model variantsLlmQueryType
with member enrichment type
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and contents of the llm enum module
# Check if the llm.ts file exists in the enums directory
fd -p "services/libs/types/src/enums/llm.ts$"
# If it exists, check its contents for the expected enums
if [ $? -eq 0 ]; then
echo "Checking enum definitions..."
ast-grep --pattern 'enum LlmModelType {
$$$
}' services/libs/types/src/enums/llm.ts
ast-grep --pattern 'enum LlmQueryType {
$$$
}' services/libs/types/src/enums/llm.ts
fi
Length of output: 853
services/libs/common_services/package.json (2)
15-15
: Verify security advisories for AWS SDK Bedrock client.
Let's ensure this version is free from known vulnerabilities.
✅ Verification successful
No security vulnerabilities found for @aws-sdk/[email protected]
Both npm audit and GitHub security advisories show no known vulnerabilities for this package version.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security advisories for the AWS SDK Bedrock client
# Check npm for security advisories
echo "Checking npm advisories for @aws-sdk/client-bedrock-runtime:"
npm audit --json @aws-sdk/[email protected] 2>/dev/null | jq '.advisories'
# Check for GitHub security advisories
echo "Checking GitHub security advisories:"
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: NPM, package: "@aws-sdk/client-bedrock-runtime") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 659
15-15
: LGTM! Verify AWS SDK version compatibility.
The addition of @aws-sdk/client-bedrock-runtime
is appropriate for implementing LLM capabilities through AWS Bedrock. The caret versioning (^3.572.0
) is a good practice for maintaining stability while allowing security updates.
Let's verify AWS SDK version compatibility:
backend/src/database/migrations/V1731052735__llm-prompt-history.sql (1)
15-17
: Optimize index strategy
The current index setup might be redundant:
- The composite index
ix_llmPromptHistory_type_entityId
can already serve queries on thetype
column alone - Having separate indexes on
type
andentityId
might not provide additional performance benefits and could impact write performance
Consider keeping only the composite index and the index on entityId
:
create index "ix_llmPromptHistory_type_entityId" on "llmPromptHistory"("type", "entityId");
create index "ix_llmPromptHistory_entityId" on "llmPromptHistory"("entityId");
-create index "ix_llmPromptHistory_type" on "llmPromptHistory"("type");
Let's verify if there are any existing queries that might specifically benefit from the separate type index:
✅ Verification successful
Index optimization confirmed as beneficial
Based on the codebase analysis, the current usage patterns show only insert operations (insertPromptHistoryEntry
) and no explicit query patterns that would benefit from a separate index on the type
column. The composite index ix_llmPromptHistory_type_entityId
will efficiently cover queries filtering by type
alone, making the separate type
index redundant.
- The repository only shows INSERT operations
- No SELECT queries were found filtering by
type
alone - The composite index can already optimize type-based queries if needed in the future
Proceed with removing the separate type
index as suggested, keeping only:
ix_llmPromptHistory_type_entityId
(covers type-based queries)ix_llmPromptHistory_entityId
(for entity-specific lookups)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for queries that filter by type without entityId
rg -g '*.{ts,js}' 'llmPromptHistory.*where.*type\s*=' -A 5
Length of output: 60
Script:
#!/bin/bash
# Let's try a broader search for any queries involving llmPromptHistory table
rg -g '*.{ts,js}' 'llmPromptHistory.*where' -A 5
# Also search for any TypeORM repository usage or query builders
rg -g '*.{ts,js}' 'llmPromptHistory.*find' -A 5
# Search for any direct SQL queries
rg -g '*.{sql,ts,js}' 'SELECT.*FROM.*llmPromptHistory' -A 5
Length of output: 162
Script:
#!/bin/bash
# Let's check for the entity definition and repository to understand the query patterns
rg -g '*.{ts,js}' 'class.*LlmPromptHistory' -A 20
# Look for any imports or references to this table
rg -g '*.{ts,js}' 'llmPromptHistory' -i
# Check for any TypeORM decorators that might indicate query patterns
rg -g '*.{ts,js}' '@Index.*llmPromptHistory' -A 5
Length of output: 3035
services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts (2)
Line range hint 1-28
: Implementation looks good!
The overall implementation is well-structured with:
- Clear separation of concerns
- Proper use of factory pattern for enrichment services
- Type-safe interfaces
- Efficient member fetching with pagination support
9-9
: Verify import path consistency across the service
The import path change from '../main' to '../service' appears to be part of a broader service restructuring.
✅ Verification successful
The verification results show:
- No remaining '../main' imports found in the codebase
- The service.ts file exists at the expected location
- The 'svc' is properly exported from the service module
Let me verify one more aspect - the actual import usage in the activities directory to ensure consistency:
Import path change is consistent and correctly implemented
The import path change from '../main' to '../service' has been consistently applied across all activity files, and the service module properly exports the required interface. All files in the activities directory are using the new import path, and there are no remaining references to the old '../main' import.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that all files in the service use the new import path consistently
# and that the service module exists and exports the expected interface
# Check if other files still use the old import path
echo "Checking for any remaining '../main' imports..."
rg "from '\.\./main'" services/apps/premium/members_enrichment_worker/
# Verify the existence of the service module
echo "Verifying '../service' module exists..."
fd -t f "service.ts" services/apps/premium/members_enrichment_worker/
# Check the service interface
echo "Checking service exports..."
rg "export.*svc" services/apps/premium/members_enrichment_worker/src/service.ts
Length of output: 1028
Script:
#!/bin/bash
# Check all imports in the activities directory
echo "Checking imports in activities directory..."
rg "from '\.\./service'" services/apps/premium/members_enrichment_worker/src/activities/
Length of output: 529
services/libs/types/src/index.ts (1)
64-64
: LGTM! Verify the module path.
The new export follows the existing pattern and integrates well with the codebase structure.
Let's verify the module path and check for potential circular dependencies:
✅ Verification successful
Export path verified and no circular dependencies found ✅
The module structure is valid:
llm.ts
exists and imports from./enums
without circular dependencies- Contains well-structured interfaces and constants for LLM functionality
- Follows TypeScript best practices for type definitions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the llm module exists and check for circular dependencies
# Check if the llm module exists
fd -t f "llm.(ts|js)" "services/libs/types/src"
# Check for potential circular dependencies
rg -l "from '\.\.\/index'" "services/libs/types/src/llm"
Length of output: 273
Script:
#!/bin/bash
# Check for potential circular dependencies with corrected regex
rg "from '../index'" "services/libs/types/src/llm.ts"
# Check the content of llm.ts to understand its structure
cat "services/libs/types/src/llm.ts"
# Check if llm.ts re-exports from enums/llm.ts
rg "from.*enums/llm" "services/libs/types/src/llm.ts"
Length of output: 2678
services/apps/premium/members_enrichment_worker/src/activities/syncEnrichedData.ts (1)
4-4
: Verify the service import path change across the codebase
The import path change from '../main' to '../service' appears to be part of a broader restructuring. Let's verify this change is consistent across related files.
✅ Verification successful
Let me verify the import path changes more thoroughly by checking the actual content of these files to ensure the service import is used consistently and correctly.
Import path change is consistent across the codebase
The import change from '../main' to '../service' has been consistently implemented across the codebase. All files that require the service instance are using the new '../service' import path, and there are no remaining references to the old '../main' import. The change appears to be part of a successful refactoring effort.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining '../main' imports that might need updating
# and verify the new '../service' import is used consistently
echo "Checking for any remaining old imports..."
rg -l "from '\.\./main'" services/apps/premium/members_enrichment_worker/
echo "Verifying new service import usage..."
rg -l "from '\.\./service'" services/apps/premium/members_enrichment_worker/
Length of output: 735
Script:
#!/bin/bash
# Check the actual import statements and their usage in the files
echo "Checking service import usage in all related files..."
rg "from '\.\./service'" services/apps/premium/members_enrichment_worker/src/ -A 2
# Also verify if there are any other variations of service imports
echo -e "\nChecking for other service-related imports..."
rg "import.*service" services/apps/premium/members_enrichment_worker/src/
Length of output: 4144
services/libs/common_services/src/repos/llmPromptHistory.repo.ts (2)
1-5
: LGTM! Clean imports and proper class declaration.
The imports are well-organized and the class properly extends RepositoryBase with correct type parameters.
10-30
: Consider adding createdAt to match database schema.
The implementation looks solid with proper SQL injection protection and metadata handling. However, according to the AI summary, the database table includes a createdAt
column which isn't being set in this insert statement.
Let's verify the table schema:
If the column exists without a default value, consider adding it to the insert statement:
insert into "llmPromptHistory"(
type, model, "entityId", metadata, prompt,
answer, "inputTokenCount", "outputTokenCount",
- "responseTimeSeconds"
+ "responseTimeSeconds", "createdAt"
)
values(
$(type), $(model), $(entityId), $(metadata),
$(prompt), $(answer), $(inputTokenCount),
- $(outputTokenCount), $(responseTimeSeconds)
+ $(outputTokenCount), $(responseTimeSeconds), NOW()
);
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/authenticateLFAuth0.ts (1)
5-5
: LGTM! Verify the new service module path.
The import path change from main
to service
aligns with the broader restructuring effort.
Let's verify the new import path exists:
✅ Verification successful
Import path change is valid and correctly implemented ✅
The new service.ts
module exists at the correct location and exports the svc
instance of ServiceWorker
with proper configuration, maintaining the expected interface. The change from main
to service
is consistent with the codebase structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and exports of the new service module
# Check if the service.ts file exists
fd -t f "service.ts" "services/apps/premium/members_enrichment_worker/src"
# Check the exports from the service module
rg -A 5 "export.*svc" "services/apps/premium/members_enrichment_worker/src"
Length of output: 738
Script:
#!/bin/bash
# Let's verify the service module's interface and exports
cat services/apps/premium/members_enrichment_worker/src/service.ts
Length of output: 848
services/apps/premium/members_enrichment_worker/src/activities.ts (2)
9-9
: LGTM! Clean addition of the new function export.
The changes follow good practices:
- Function name is descriptive and follows naming conventions
- Import and export are properly organized
Also applies to: 34-34
9-9
: Verify the implementation of processMemberSources
Since this function processes multiple sources and uses LLM for consolidation, let's verify its implementation details for potential performance considerations and rate limiting.
Also applies to: 34-34
✅ Verification successful
Rate limiting and performance considerations are properly handled
The implementation of processMemberSources
has several built-in safeguards and optimizations:
- Credit-based rate limiting through
hasRemainingCredits
check before data enrichment - Redis-based caching of credit status with 60-second TTL
- Source-specific cache obsolescence checks
- LLM data consolidation with confidence thresholds (>= 0.85)
- Optimization of response size by prioritizing and filtering data
The function also includes proper error handling and logging for monitoring purposes.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the implementation of processMemberSources for performance considerations
# Find and display the implementation
ast-grep --pattern 'function processMemberSources($_) {
$$$
}'
# Look for rate limiting or throttling mechanisms
rg -A 5 'rate|limit|throttle' ./services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 177
Script:
#!/bin/bash
# Let's try a different approach to find the implementation
# Search for the file containing processMemberSources
fd "activities.ts|enrichment.ts" services/apps/premium/members_enrichment_worker/src/
# Once we find the file, let's look at its content
rg -A 20 "processMemberSources" services/apps/premium/members_enrichment_worker/src/
Length of output: 12521
Script:
#!/bin/bash
# Let's look at the complete implementation of processMemberSources and related functions
# Get the full content of enrichment.ts
cat services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
# Also check for any rate limiting configurations in the service
rg -l "rate|limit|throttle" services/apps/premium/members_enrichment_worker/src/
Length of output: 12253
services/apps/premium/members_enrichment_worker/package.json (1)
20-20
: LGTM!
The new dependency @crowd/common_services
follows the workspace package pattern and is correctly placed in alphabetical order.
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts (1)
8-8
: Verify the service import path change across the codebase.
The import path change from ../../main
to ../../service
appears to be part of a larger refactoring effort.
Let's verify this change is consistent across related files:
✅ Verification successful
Import path change is consistent across the codebase
The import change from ../../main
to ../../service
is part of a consistent pattern across the codebase. All files that import svc
are now correctly importing it from the service
module, with proper relative paths (e.g., ./service
, ../service
, or ../../service
depending on their location in the directory structure). No remaining imports from ../../main
were found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining imports from '../../main'
# Expect: No results, indicating all imports have been updated
rg -l "from '../../main'" services/apps/premium/members_enrichment_worker/src/
Length of output: 86
Script:
#!/bin/bash
# Let's verify the import changes more thoroughly:
# 1. Check for any imports from 'service'
echo "=== Checking imports from 'service' ==="
rg "from ['\"].*service['\"]" services/apps/premium/members_enrichment_worker/src/
# 2. Check all import statements in the directory to understand the pattern
echo -e "\n=== Checking all import statements in the directory ==="
rg "^import.*from ['\"]\.\./" services/apps/premium/members_enrichment_worker/src/
# 3. Check specifically for the 'svc' import
echo -e "\n=== Checking for 'svc' imports ==="
rg "import.*\{ ?svc ?\}" services/apps/premium/members_enrichment_worker/src/
Length of output: 10016
services/apps/premium/members_enrichment_worker/src/schedules/getMembersToEnrich.ts (3)
Line range hint 8-38
: LGTM! The schedule configuration is well-structured
The daily schedule at 6 AM with SKIP overlap policy is appropriate for a batch enrichment task. The retry configuration with a 15-second initial interval and maximum of 3 attempts provides good resilience while preventing excessive retries.
5-5
: Verify the consistency of service imports across the codebase
The import path has been changed from '../main' to '../service'. Let's ensure this change is consistent across all files.
Line range hint 40-71
: Review the scheduling frequency and retry configuration
A few concerns about the current configuration:
- The bi-monthly schedule (
0 0 1 */2 *
) in production seems unusually infrequent. Consider if this aligns with business requirements for LFID enrichment. - The
BUFFER_ONE
overlap policy with a bi-monthly schedule might cause resource contention if tasks take longer than expected. - The 2-second retry interval might be too aggressive and could lead to unnecessary load during failures.
Consider these adjustments:
- Review if a monthly or weekly schedule would be more appropriate for production
- Consider using
ScheduleOverlapPolicy.SKIP
if tasks might take longer than expected - Increase the retry initial interval to at least 5-10 seconds to prevent overwhelming the system during failures
services/libs/types/src/llm.ts (1)
91-112
: LGTM! Well-structured enrichment result interface.
The LlmMemberEnrichmentResult
interface provides a comprehensive structure for handling various aspects of member enrichment, including confidence scores, identity management, attributes, and organization relationships.
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts (1)
9-9
: Verify service import path changes across the codebase
The import path change from ../../main
to ../../service
suggests a service layer restructuring.
✅ Verification successful
The results show that there are no remaining imports from '../../main', and all relevant files in the lf-auth0 directory consistently use the new '../../service' import path. This confirms that the import path change has been applied consistently across the codebase.
Import path changes are consistent across the codebase ✅
The migration from ../../main
to ../../service
has been properly implemented across all related files in the members enrichment worker service.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining old import paths and verify consistency
# Look for any remaining imports from 'main'
rg -l "from '../../main'" "services/apps/premium/members_enrichment_worker/src/"
# Look for new service imports to verify consistency
rg -l "from '../../service'" "services/apps/premium/members_enrichment_worker/src/"
Length of output: 648
services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (2)
104-104
: Verify integration with the new processing approach
The code now uses processMemberSources
instead of data squashing. Ensure that all necessary data transformations are handled correctly in the new processing flow.
✅ Verification successful
Let me search for the implementation of processMemberSources
to understand how it handles the data transformations.
Let me check the implementation of processMemberSources
in the enrichment activities file to understand how it handles the data processing.
Based on the implementation of processMemberSources
, I can now provide a final response:
The new processing approach is properly implemented
The code correctly handles data transformations through a well-structured process:
- Fetches and normalizes enrichment data from different sources
- Uses LLM-based data consolidation for multiple sources with confidence scoring
- Implements proper data cleaning (removing contributions and reach data to manage context size)
- Follows strict verification rules for identities and data consolidation
- Includes safeguards like confidence thresholds (0.85) for applying changes
The removal of the old squashing approach is intentional, replaced with a more sophisticated LLM-based consolidation system that provides better data quality control.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the implementation and usage of processMemberSources
ast-grep --pattern 'async function processMemberSources($_) {
$$$
}'
# Search for any remaining references to the old squashing approach
rg -i 'squash|normalize' --type ts
Length of output: 35172
Script:
#!/bin/bash
# Search for the implementation of processMemberSources
ast-grep --pattern 'async function processMemberSources($_) {
$$$
}'
# Also search for any direct references to this function
rg 'processMemberSources' --type ts -A 3
Length of output: 2866
Script:
#!/bin/bash
# Get the implementation of processMemberSources from the enrichment activities file
cat services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 11587
22-22
: Verify timeout settings for processing multiple sources
The timeout of 20 seconds might be insufficient when processing multiple enrichment sources simultaneously. Consider adjusting the startToCloseTimeout
based on the expected processing time for multiple sources.
Also applies to: 23-31
services/archetypes/worker/src/index.ts (2)
206-249
: LGTM! Well-structured Temporal worker initialization.
The implementation correctly:
- Handles TLS configuration with proper certificate encoding
- Configures worker settings with appropriate defaults
- Sets up workflow bundling and activity registration
103-103
: Verify the impact of the method signature change.
The addition of the optional initWorker
parameter is a good enhancement for flexibility. However, since this modifies a method signature in what appears to be a base worker class, we should verify that all inheriting classes handle this change correctly.
✅ Verification successful
No impact from the method signature change
The search results show that while many services import and use ServiceWorker
, none of them explicitly override the init
method. The only two init
method implementations found are:
- In the
ServiceWorker
class itself (the file under review) - In the
Consumer
archetype which is a different base class
Since no inheriting classes override the init
method, the addition of the optional parameter with a default value of true
maintains backward compatibility and has no breaking impact on the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for classes that extend ServiceWorker and override init
ast-grep --pattern 'class $_ extends ServiceWorker {
$$$
init($$$) {
$$$
}
$$$
}'
Length of output: 95
Script:
#!/bin/bash
# Let's try a different approach to find classes extending ServiceWorker
# First, find all TypeScript files
echo "=== Searching for ServiceWorker imports ==="
rg "import.*ServiceWorker" -A 2
echo -e "\n=== Searching for classes extending ServiceWorker ==="
rg "extends\s+ServiceWorker" -A 5
echo -e "\n=== Searching for init method implementations ==="
rg "override\s+async\s+init" -A 3
Length of output: 4159
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)
282-288
: LGTM: Improved email handling with better type safety.
The changes improve the robustness of email handling by supporting both array and string formats. The implementation correctly:
- Handles type variations using Array.isArray()
- Maintains email validation using isEmail
- Preserves the existing comma-separation logic for string inputs
282-289
: Verify email handling in upstream data sources.
The change in email handling logic suggests that the data format from Crustdata API might have changed.
✅ Verification successful
Let me gather more specific information about the email handling in these files.
Let me check one more thing to see the Crustdata response type definition specifically for the email field.
Email handling change appears to be a bug fix, not an API change
The Crustdata API type definition clearly shows email
is defined as a string (email: string
), but the code handles it as both array and comma-separated string. This suggests the change is actually fixing a bug in the code to handle malformed data, rather than reflecting an API change. Other enrichment services (Progai, Clearbit) handle emails consistently with their respective type definitions.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
: Type definition showsemail: string
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
: Code defensively handles both array and comma-separated string formats
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for other places where Crustdata email data is handled
rg -l "data\.email" --type ts
# Look for Crustdata API response type definitions
ast-grep --pattern 'interface $_Response {
$$$
email: $_
$$$
}'
Length of output: 712
Script:
#!/bin/bash
# Check email handling in other enrichment services
rg -A 5 "data\.email" services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts
rg -A 5 "data\.email" services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts
# Look for type definitions in Crustdata
rg -A 10 "interface.*Response" services/apps/premium/members_enrichment_worker/src/sources/crustdata/
Length of output: 1923
Script:
#!/bin/bash
# Look for the main Crustdata response type that includes email
rg -B 5 -A 15 "email.*:" services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
Length of output: 672
services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (3)
533-536
: Ensure consistent transaction usage
In the resetMemberEnrichedAt
function, you're accepting a DbConnOrTx
object as tx
. Ensure that when calling this function within a transaction, you're consistently using the transaction object to maintain atomicity.
549-550
: Confirm the impact of resetting lastEnriched
after cache updates
After inserting or updating the member enrichment cache in insertMemberEnrichmentCacheDb
and updateMemberEnrichmentCacheDb
, you're resetting the lastEnriched
field to null
(lines 549–550 and 568–569). Verify that this behavior aligns with the application's logic and that downstream processes correctly handle a null
lastEnriched
value.
Also applies to: 568-569
588-590
: Verify all usages of findMemberEnrichmentCacheDb
accommodate the new parameters
The findMemberEnrichmentCacheDb
function now accepts an array of sources
(lines 588–590) and returns multiple results. Ensure that all calls to this function have been updated to pass an array of MemberEnrichmentSource
and properly handle the array of results to prevent runtime errors.
You can run the following script to identify usages that may need updates:
This will help you locate all instances where the function is called and verify that they conform to the new signature.
✅ Verification successful
Let me check the implementation and usage context more thoroughly.
Let me try a different search pattern to get the implementation details.
Let me try one final search to find the enum definition and any other usages.
Function usage is properly aligned with the new signature
The verification shows that the only usage of findMemberEnrichmentCacheDb
is in the findMemberEnrichmentCache
function in services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
, and it correctly:
- Accepts an array of
MemberEnrichmentSource
as the first parameter - Passes it along with the correct parameter order (connection, memberId, sources)
- Expects an array of
IMemberEnrichmentCache
as the return type
The implementation and usage are consistent with the changes, and no updates are needed in other parts of the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all calls to `findMemberEnrichmentCacheDb` and check for the correct usage of the new parameters.
# Search for function calls
rg 'findMemberEnrichmentCacheDb\(' -A 3
Length of output: 504
Script:
#!/bin/bash
# Get the implementation of findMemberEnrichmentCacheForAllSources to see how it uses the result
ast-grep --pattern 'export async function findMemberEnrichmentCacheForAllSources($_) {
$$$
}'
# Get the implementation of findMemberEnrichmentCacheDb to verify the signature change
ast-grep --pattern 'export async function findMemberEnrichmentCacheDb($_) {
$$$
}'
# Check for any tests that might show the usage pattern
rg "findMemberEnrichmentCacheDb" -g "*.test.ts" -g "*.spec.ts" -A 5
Length of output: 252
Script:
#!/bin/bash
# Get more context around the function call to see how it's being used
rg 'findMemberEnrichmentCacheDb' -B 5 -A 10
# Get the type definition of MemberEnrichmentSource to understand the parameter
rg 'type MemberEnrichmentSource' -A 5
Length of output: 5520
Script:
#!/bin/bash
# Find the enum definition and any other usages
rg 'enum MemberEnrichmentSource' -A 10
# Look for any other calls to the function in the codebase
rg -g '!*.{md,json}' 'findMemberEnrichmentCacheDb\(' -B 3 -A 3
Length of output: 1410
public constructor(dbStore: DbStore, parentLog: Logger) { | ||
super(dbStore, parentLog) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unnecessary constructor.
Since the constructor only forwards parameters to the parent class without additional initialization, it can be safely removed. TypeScript will automatically handle the parent constructor call.
- public constructor(dbStore: DbStore, parentLog: Logger) {
- super(dbStore, parentLog)
- }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public constructor(dbStore: DbStore, parentLog: Logger) { | |
super(dbStore, parentLog) | |
} |
🧰 Tools
🪛 Biome
[error] 6-8: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
@@ -6,6 +6,7 @@ | |||
"start:debug": "CROWD_TEMPORAL_TASKQUEUE=members-enrichment SERVICE=members-enrichment-worker LOG_LEVEL=trace tsx --inspect=0.0.0.0:9232 src/main.ts", | |||
"dev:local": "nodemon --watch src --watch ../../../libs --ext ts --exec pnpm run start:debug:local", | |||
"dev": "nodemon --watch src --watch ../../../libs --ext ts --exec pnpm run start:debug", | |||
"script:onboarding": "SERVICE=script tsx --inspect src/bin/onboarding.ts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Specify a unique debug port for the onboarding script
The --inspect
flag without a port number will use the default Node.js debug port (9229), which could conflict with other services running in debug mode. Consider specifying a unique port to prevent debugging conflicts.
- "script:onboarding": "SERVICE=script tsx --inspect src/bin/onboarding.ts",
+ "script:onboarding": "SERVICE=script tsx --inspect=0.0.0.0:9233 src/bin/onboarding.ts",
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"script:onboarding": "SERVICE=script tsx --inspect src/bin/onboarding.ts", | |
"script:onboarding": "SERVICE=script tsx --inspect=0.0.0.0:9233 src/bin/onboarding.ts", |
export interface LlmAttributeValue { | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
default?: any | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
[source: string]: any | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve type safety by avoiding any
.
The use of any
with eslint-disable comments reduces type safety.
Consider using a more specific type or a generic type parameter:
export interface LlmAttributeValue {
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
- default?: any
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
- [source: string]: any
+ default?: unknown
+ [source: string]: unknown
}
Or even better, make it generic:
export interface LlmAttributeValue<T = unknown> {
default?: T
[source: string]: T | undefined
}
export interface LlmIdentity { | ||
t: string // type | ||
v: string // value | ||
p: string // platform | ||
ve: boolean // verification status | ||
} | ||
|
||
export interface LlmOrganizationConnection { | ||
orgId: string | ||
t: string // title | ||
ds: string // dateStart | ||
de: string // dateEnd | ||
s: string // source | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve property naming for better code readability.
The single-letter property names (t
, v
, p
, ve
, ds
, de
, s
) make the code harder to understand and maintain.
Consider using full descriptive names:
export interface LlmIdentity {
- t: string // type
- v: string // value
- p: string // platform
- ve: boolean // verification status
+ type: string
+ value: string
+ platform: string
+ isVerified: boolean
}
export interface LlmOrganizationConnection {
orgId: string
- t: string // title
- ds: string // dateStart
- de: string // dateEnd
- s: string // source
+ title: string
+ dateStart: string
+ dateEnd: string
+ source: string
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export interface LlmIdentity { | |
t: string // type | |
v: string // value | |
p: string // platform | |
ve: boolean // verification status | |
} | |
export interface LlmOrganizationConnection { | |
orgId: string | |
t: string // title | |
ds: string // dateStart | |
de: string // dateEnd | |
s: string // source | |
} | |
export interface LlmIdentity { | |
type: string | |
value: string | |
platform: string | |
isVerified: boolean | |
} | |
export interface LlmOrganizationConnection { | |
orgId: string | |
title: string | |
dateStart: string | |
dateEnd: string | |
source: string | |
} |
export const LLM_SETTINGS: Record<LlmQueryType, ILlmSettings> = { | ||
[LlmQueryType.MEMBER_ENRICHMENT]: { | ||
modelId: LlmModelType.CLAUDE_3_5_SONNET, | ||
arguments: { | ||
max_tokens: 200000, | ||
anthropic_version: 'bedrock-2023-05-31', | ||
temperature: 0, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Adjust max_tokens configuration to align with actual usage patterns
The verification reveals that the max_tokens: 200000
setting in services/libs/types/src/llm.ts
is significantly higher than what's being used in actual implementations. Other LLM workflows in the codebase (mergeOrganizationsWithLLM.ts
and mergeMembersWithLLM.ts
) use a much more reasonable value of 2000 tokens.
services/libs/types/src/llm.ts
: Reducemax_tokens
from 200000 to 2000 for theMEMBER_ENRICHMENT
query type to maintain consistency with other implementations- Consider extracting the common LLM configuration (including
anthropic_version
andtemperature
) into a shared configuration object since these values are identical across implementations
🔗 Analysis chain
Review the LLM configuration settings.
The max_tokens
value of 200,000 seems unusually high. This could lead to:
- Increased costs due to token consumption
- Longer processing times
- Potential timeout issues
Additionally, the hardcoded anthropic_version
might need periodic updates.
Consider:
- Moving these configurations to environment variables or a configuration service
- Implementing a mechanism to update the Anthropic version without code changes
- Setting a more conservative
max_tokens
limit based on actual usage patterns
Run this script to analyze token usage patterns:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find LLM token usage patterns in the codebase
# Look for token limits in tests or documentation
# Search for token-related configurations
rg -i "max.*tokens|tokens.*limit"
# Search for test cases involving tokens
rg -i "test.*tokens|tokens.*test"
Length of output: 330
Script:
#!/bin/bash
# Let's analyze the context around these token configurations and their usage
# Check the context of token configurations
rg -B 5 -A 5 "max_tokens"
# Look for any LLM response handling or error cases
rg -i "response.*timeout|timeout.*error|token.*limit.*exceed"
# Find any documentation about token limits
fd "README|CONTRIBUTING" --exec grep -i -l "token"
# Look for any configuration patterns
ast-grep --pattern 'max_tokens: $_'
Length of output: 4020
try { | ||
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata) | ||
} catch (err) { | ||
this.log.error(err, 'Failed to save LLM prompt history entry!') | ||
throw err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle prompt history saving errors without interrupting the main flow
Currently, if saving the prompt history entry fails, an error is thrown, which stops the main operation. Since saving history is optional, consider logging the error but not throwing it, so that the main response can still be returned to the user.
Apply this diff to prevent throwing an error when history saving fails:
if (saveHistory) {
try {
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata)
} catch (err) {
this.log.error(err, 'Failed to save LLM prompt history entry!')
- throw err
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try { | |
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata) | |
} catch (err) { | |
this.log.error(err, 'Failed to save LLM prompt history entry!') | |
throw err | |
} | |
try { | |
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata) | |
} catch (err) { | |
this.log.error(err, 'Failed to save LLM prompt history entry!') | |
} |
const result = JSON.parse(response.answer) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for JSON parsing of LLM response
Parsing response.answer
without a try-catch block could lead to unhandled exceptions if the JSON is invalid. Add proper error handling to manage parsing errors gracefully.
Apply this diff to add error handling:
- const result = JSON.parse(response.answer)
+ let result: LlmMemberEnrichmentResult
+ try {
+ result = JSON.parse(response.answer)
+ } catch (err) {
+ this.log.error(err, 'Failed to parse LLM response as JSON')
+ throw new Error('Invalid JSON response from LLM')
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const result = JSON.parse(response.answer) | |
let result: LlmMemberEnrichmentResult | |
try { | |
result = JSON.parse(response.answer) | |
} catch (err) { | |
this.log.error(err, 'Failed to parse LLM response as JSON') | |
throw new Error('Invalid JSON response from LLM') | |
} |
let res: InvokeModelCommandOutput | ||
try { | ||
res = await client.send(command) | ||
} catch (err) { | ||
this.log.error(err, { settings, prompt }, 'Failed to query LLM!') | ||
throw err | ||
} | ||
|
||
const body = JSON.parse(res.body.transformToString()) | ||
const responseTimeSeconds = end() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure correct parsing of response body from AWS SDK
The method transformToString()
used on res.body
may not be valid for parsing the response. In AWS SDK v3, the response body is typically a readable stream, and you need to properly collect and parse it.
Consider using the collectBodyString
utility function provided by the SDK to correctly read the body.
Here's how you might adjust the code:
+ import { collectBodyString } from '@aws-sdk/util-body-length-node'
let res: InvokeModelCommandOutput
try {
res = await client.send(command)
} catch (err) {
this.log.error(err, { settings, prompt }, 'Failed to query LLM!')
throw err
}
- const body = JSON.parse(res.body.transformToString())
+ const bodyString = await collectBodyString(res.Body)
+ const body = JSON.parse(bodyString)
Alternatively, you can use a helper function to read the stream:
+ import { Readable } from 'stream'
+ const streamToString = (stream: Readable): Promise<string> =>
+ new Promise((resolve, reject) => {
+ const chunks = []
+ stream.on('data', (chunk) => chunks.push(chunk))
+ stream.on('error', reject)
+ stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')))
+ })
let res: InvokeModelCommandOutput
try {
res = await client.send(command)
} catch (err) {
this.log.error(err, { settings, prompt }, 'Failed to query LLM!')
throw err
}
- const body = JSON.parse(res.body.transformToString())
+ const bodyString = await streamToString(res.Body as Readable)
+ const body = JSON.parse(bodyString)
Committable suggestion skipped: line range outside the PR's diff.
for (const n of normalized) { | ||
if (n.contributions) { | ||
normalizedContributions.push(n.contributions) | ||
delete n.contributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid using the delete
operator to enhance performance
Using the delete
operator can negatively impact performance. Instead, consider setting the properties to undefined
or using object destructuring to create new objects without those properties.
Apply these changes:
- delete n.contributions
+ n.contributions = undefined
- delete n.reach
+ n.reach = undefined
- delete normalized.contributions
+ normalized.contributions = undefined
- delete normalized.reach
+ normalized.reach = undefined
Also applies to: 171-171, 180-180, 184-184
🧰 Tools
🪛 Biome
[error] 167-167: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
export async function fetchMemberDataForLLMSquashing( | ||
db: DbStore, | ||
memberId: string, | ||
): Promise<IMemberOriginalData | null> { | ||
const result = await db.connection().oneOrNone( | ||
` | ||
with member_orgs as (select distinct mo."memberId", | ||
mo."organizationId" as "orgId", | ||
o."displayName" as "orgName", | ||
mo.title as "jobTitle", | ||
mo."dateStart", | ||
mo."dateEnd", | ||
mo.source | ||
from "memberOrganizations" mo | ||
inner join organizations o on mo."organizationId" = o.id | ||
where mo."memberId" = $(memberId) | ||
and mo."deletedAt" is null | ||
and o."deletedAt" is null) | ||
select m."displayName", | ||
m.attributes, | ||
m."manuallyChangedFields", | ||
coalesce( | ||
(select json_agg( | ||
(select row_to_json(r) | ||
from (select mi.type, | ||
mi.platform, | ||
mi.value) r) | ||
) | ||
from "memberIdentities" mi | ||
where mi."memberId" = m.id | ||
and verified = true), '[]'::json) as identities, | ||
coalesce( | ||
nullif( | ||
json_agg( | ||
(select row_to_json(r) | ||
from (select mo."orgId", | ||
mo."orgName", | ||
mo."jobTitle", | ||
mo."dateStart", | ||
mo."dateEnd", | ||
mo.source) r) | ||
)::jsonb, | ||
jsonb_build_array(null) | ||
)::json, | ||
'[]'::json | ||
) as organizations | ||
from members m | ||
left join member_orgs mo on mo."memberId" = m.id | ||
where m.id = $(memberId) | ||
and m."deletedAt" is null | ||
group by m.id, m."displayName", m.attributes, m."manuallyChangedFields"; | ||
`, | ||
{ | ||
memberId, | ||
}, | ||
) | ||
|
||
return result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize the GROUP BY
clause for better performance
In the fetchMemberDataForLLMSquashing
function, the GROUP BY
clause includes m.id
, m."displayName"
, m.attributes
, and m."manuallyChangedFields"
(line 66). Since m.id
uniquely identifies each member, grouping by m.id
alone is sufficient. Including large text or JSON fields like attributes
and manuallyChangedFields
can negatively impact query performance.
Consider modifying the GROUP BY
clause as follows:
- group by m.id, m."displayName", m.attributes, m."manuallyChangedFields";
+ GROUP BY m.id;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export async function fetchMemberDataForLLMSquashing( | |
db: DbStore, | |
memberId: string, | |
): Promise<IMemberOriginalData | null> { | |
const result = await db.connection().oneOrNone( | |
` | |
with member_orgs as (select distinct mo."memberId", | |
mo."organizationId" as "orgId", | |
o."displayName" as "orgName", | |
mo.title as "jobTitle", | |
mo."dateStart", | |
mo."dateEnd", | |
mo.source | |
from "memberOrganizations" mo | |
inner join organizations o on mo."organizationId" = o.id | |
where mo."memberId" = $(memberId) | |
and mo."deletedAt" is null | |
and o."deletedAt" is null) | |
select m."displayName", | |
m.attributes, | |
m."manuallyChangedFields", | |
coalesce( | |
(select json_agg( | |
(select row_to_json(r) | |
from (select mi.type, | |
mi.platform, | |
mi.value) r) | |
) | |
from "memberIdentities" mi | |
where mi."memberId" = m.id | |
and verified = true), '[]'::json) as identities, | |
coalesce( | |
nullif( | |
json_agg( | |
(select row_to_json(r) | |
from (select mo."orgId", | |
mo."orgName", | |
mo."jobTitle", | |
mo."dateStart", | |
mo."dateEnd", | |
mo.source) r) | |
)::jsonb, | |
jsonb_build_array(null) | |
)::json, | |
'[]'::json | |
) as organizations | |
from members m | |
left join member_orgs mo on mo."memberId" = m.id | |
where m.id = $(memberId) | |
and m."deletedAt" is null | |
group by m.id, m."displayName", m.attributes, m."manuallyChangedFields"; | |
`, | |
{ | |
memberId, | |
}, | |
) | |
return result | |
} | |
export async function fetchMemberDataForLLMSquashing( | |
db: DbStore, | |
memberId: string, | |
): Promise<IMemberOriginalData | null> { | |
const result = await db.connection().oneOrNone( | |
` | |
with member_orgs as (select distinct mo."memberId", | |
mo."organizationId" as "orgId", | |
o."displayName" as "orgName", | |
mo.title as "jobTitle", | |
mo."dateStart", | |
mo."dateEnd", | |
mo.source | |
from "memberOrganizations" mo | |
inner join organizations o on mo."organizationId" = o.id | |
where mo."memberId" = $(memberId) | |
and mo."deletedAt" is null | |
and o."deletedAt" is null) | |
select m."displayName", | |
m.attributes, | |
m."manuallyChangedFields", | |
coalesce( | |
(select json_agg( | |
(select row_to_json(r) | |
from (select mi.type, | |
mi.platform, | |
mi.value) r) | |
) | |
from "memberIdentities" mi | |
where mi."memberId" = m.id | |
and verified = true), '[]'::json) as identities, | |
coalesce( | |
nullif( | |
json_agg( | |
(select row_to_json(r) | |
from (select mo."orgId", | |
mo."orgName", | |
mo."jobTitle", | |
mo."dateStart", | |
mo."dateEnd", | |
mo.source) r) | |
)::jsonb, | |
jsonb_build_array(null) | |
)::json, | |
'[]'::json | |
) as organizations | |
from members m | |
left join member_orgs mo on mo."memberId" = m.id | |
where m.id = $(memberId) | |
and m."deletedAt" is null | |
GROUP BY m.id; | |
`, | |
{ | |
memberId, | |
}, | |
) | |
return result | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
🧹 Outside diff range and nitpick comments (38)
services/libs/types/src/enums/llm.ts (1)
1-4
: Consider improving version management and documentation for LLM models.The hardcoded version strings could make future model updates more challenging. Consider:
- Adding JSDoc comments describing each model's capabilities and use cases
- Moving version strings to constants for easier updates
+/** Constants for LLM model versions */ +const CLAUDE_VERSIONS = { + SONNET: '20240620-v1:0', + OPUS: '20240229-v1:0', +} as const; + +/** + * Enum for supported LLM model types + * @remarks + * - CLAUDE_3_5_SONNET: Lighter, faster model suitable for routine tasks + * - CLAUDE_3_OPUS: More powerful model for complex reasoning + */ export enum LlmModelType { - CLAUDE_3_5_SONNET = 'anthropic.claude-3-5-sonnet-20240620-v1:0', - CLAUDE_3_OPUS = 'anthropic.claude-3-opus-20240229-v1:0', + CLAUDE_3_5_SONNET = `anthropic.claude-3-5-sonnet-${CLAUDE_VERSIONS.SONNET}`, + CLAUDE_3_OPUS = `anthropic.claude-3-opus-${CLAUDE_VERSIONS.OPUS}`, }services/apps/premium/members_enrichment_worker/src/main.ts (2)
4-4
: Good architectural improvement moving configuration to a service module!The refactoring to move configuration into a dedicated service module improves separation of concerns and maintainability.
Line range hint
6-16
: Consider adding error handling for initialization failuresWhile the initialization flow is well-structured, it would benefit from proper error handling to ensure graceful service shutdown in case of initialization failures.
Consider wrapping the initialization in a try-catch block:
setImmediate(async () => { + try { await svc.init() await scheduleMembersEnrichment() if (process.env['CROWD_EDITION'] === Edition.LFX) { await scheduleMembersLFIDEnrichment() } await svc.start() + } catch (error) { + console.error('Failed to initialize members enrichment worker:', error) + process.exit(1) + } })services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts (1)
Line range hint
9-14
: Improve error handling to preserve stack tracesThe current error handling loses the original stack trace by creating a new Error. Consider using error chaining to preserve the error context.
try { const db = svc.postgres.reader rows = await fetchMembersForLFIDEnrichment(db, limit, afterId) } catch (err) { - throw new Error(err) + throw new Error('Failed to fetch members for LF ID enrichment', { cause: err }) }services/libs/common_services/package.json (1)
15-15
: Consider AWS Bedrock cost management strategies.The addition of AWS Bedrock Runtime client indicates usage of AWS foundation models, which can incur significant costs. Consider implementing:
- Rate limiting and throttling mechanisms
- Cost monitoring and alerting
- Usage quotas per user/organization
- Caching strategies for repeated requests
- Fallback mechanisms for service disruptions
Would you like assistance in implementing any of these cost control mechanisms?
services/libs/common_services/src/repos/llmPromptHistory.repo.ts (1)
6-8
: Remove unnecessary constructor.The constructor can be removed as it only forwards parameters to the parent class without additional initialization.
- public constructor(dbStore: DbStore, parentLog: Logger) { - super(dbStore, parentLog) - }🧰 Tools
🪛 Biome
[error] 6-8: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
services/libs/types/src/premium/enrichment.ts (2)
34-41
: Consider using a more specific type for date fields.The
dateStart
anddateEnd
fields are currently typed asstring
. Consider using a more specific type to ensure consistent date formatting and validation.export interface IMemberOrganizationData { orgId: string orgName: string jobTitle: string - dateStart: string - dateEnd: string + dateStart: string | null // ISO 8601 format + dateEnd: string | null // ISO 8601 format source: string }
43-54
: Enhance type safety and add JSDoc documentation.The interface could benefit from:
- More specific typing for the
attributes
field- Documentation explaining the purpose and requirements of each field
+/** + * Represents the original member data aggregated from multiple sources + * @property displayName - The member's display name + * @property attributes - Key-value pairs of member attributes grouped by source + * @property manuallyChangedFields - List of fields that were manually modified + * @property identities - List of member's platform identities + * @property organizations - List of member's organization affiliations + */ export interface IMemberOriginalData { // members table data displayName: string - attributes: Record<string, Record<string, unknown>> + attributes: Record<string, Record<string, string | number | boolean | null>> manuallyChangedFields: string[] // memberIdentities table data identities: IMemberIdentity[] // memberOrganizations table data organizations: IMemberOrganizationData[] }services/apps/merge_suggestions_worker/src/workflows/mergeMembersWithLLM.ts (3)
44-46
: Good architectural improvement moving LLM configuration out of the workflow.The removal of hardcoded LLM parameters and the use of
LlmQueryType
makes the code cleaner and more maintainable. This change suggests better separation of concerns with centralized LLM configuration management.
44-51
: Consider improving error handling and type safety.The current implementation could benefit from:
- Explicit typing for the verdict
- Error handling for LLM failures
- Logging for audit and debugging purposes
Consider applying this improvement:
- const verdict = await commonActivitiesProxy.getLLMResult( + const verdict: boolean = await commonActivitiesProxy.getLLMResult( LlmQueryType.MEMBER_MERGE, members.map((member) => removeEmailLikeIdentitiesFromMember(member)), ) + // Log the decision for audit purposes + console.log(`LLM merge decision for members ${suggestion[0]}, ${suggestion[1]}: ${verdict}`) + if (verdict) { await commonActivitiesProxy.mergeMembers(suggestion[0], suggestion[1], args.tenantId) + } else { + console.log(`Skipping merge for members ${suggestion[0]}, ${suggestion[1]} based on LLM verdict`) }
Line range hint
24-24
: Consider making SUGGESTIONS_PER_RUN configurable.The hardcoded batch size of 10 suggestions per run could be moved to a configuration value for better flexibility in different environments.
Consider moving this to a configuration:
- const SUGGESTIONS_PER_RUN = 10 + const SUGGESTIONS_PER_RUN = config.get('mergeSuggestions.batchSize', 10)services/apps/premium/members_enrichment_worker/src/activities.ts (1)
34-34
: LGTM! Consider documenting the new capabilityThe export is properly added, maintaining symmetry with the import section. Since this introduces a new capability for processing multiple member sources, consider adding JSDoc comments in the implementation file to document its purpose and usage.
services/libs/types/src/llm.ts (2)
35-45
: Consider making pricing configuration externalThe pricing information is currently hardcoded, but as noted in the comment, "these numbers can change". Consider moving this to an external configuration for easier updates.
Consider moving the pricing to environment variables or a configuration file:
-export const LLM_MODEL_PRICING_MAP: Record<LlmModelType, ILlmPricing> = { +export const getLlmModelPricing = (): Record<LlmModelType, ILlmPricing> => ({ [LlmModelType.CLAUDE_3_OPUS]: { - costPer1000InputTokens: 0.015, - costPer1000OutputTokens: 0.075, + costPer1000InputTokens: Number(process.env.CLAUDE_3_OPUS_INPUT_COST ?? 0.015), + costPer1000OutputTokens: Number(process.env.CLAUDE_3_OPUS_OUTPUT_COST ?? 0.075), }, // ... similar changes for other models -} +})
67-67
: Address the TODO commentThe TODO comment needs attention: "TODO uros check if this is ok"
Would you like me to help create a GitHub issue to track this TODO item?
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts (3)
Line range hint
52-71
: Enhance error handling and robustness of the merge operationWhile the basic error handling is in place, consider the following improvements for better reliability:
- Add request timeout handling
- Implement retry mechanism for transient failures
- Add detailed error logging including response body
- Validate environment variables before usage
- Add audit logging for merge operations
Here's a suggested implementation:
export async function mergeMembers( primaryMemberId: string, secondaryMemberId: string, tenantId: string, ): Promise<void> { + const apiUrl = process.env['CROWD_API_SERVICE_URL']; + const apiToken = process.env['CROWD_API_SERVICE_USER_TOKEN']; + + if (!apiUrl || !apiToken) { + throw new Error('Required environment variables are not set'); + } + + const controller = new AbortController(); + const timeout = setTimeout(() => controller.abort(), 30000); + const res = await fetch( - `${process.env['CROWD_API_SERVICE_URL']}/tenant/${tenantId}/member/${primaryMemberId}/merge`, + `${apiUrl}/tenant/${tenantId}/member/${primaryMemberId}/merge`, { method: 'PUT', headers: { - Authorization: `Bearer ${process.env['CROWD_API_SERVICE_USER_TOKEN']}`, + Authorization: `Bearer ${apiToken}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ memberToMerge: secondaryMemberId, }), + signal: controller.signal, }, - ) + ).finally(() => clearTimeout(timeout)); - if (res.status !== 200) { - throw new Error(`Failed to merge member ${primaryMemberId} with ${secondaryMemberId}!`) + const responseBody = await res.text(); + + if (!res.ok) { + const error = new Error( + `Failed to merge member ${primaryMemberId} with ${secondaryMemberId}. ` + + `Status: ${res.status}. Response: ${responseBody}` + ); + svc.logger.error(error); + throw error; } + + svc.logger.info(`Successfully merged member ${secondaryMemberId} into ${primaryMemberId}`); }
Line range hint
13-21
: Implement consistent error handling across functionsThe current error handling approach loses valuable context and stack traces. Consider implementing a unified error handling strategy:
- Create specific error types for different failure scenarios
- Preserve error context and stack traces
- Add proper error logging
Example implementation:
// Create specific error types class MemberEnrichmentError extends Error { constructor(message: string, public cause?: Error) { super(message); this.name = 'MemberEnrichmentError'; } } // Update error handling in functions export async function getIdentitiesExistInOtherMembers( tenantId: string, excludeMemberId: string, identities: IMemberIdentity[], ): Promise<IMemberIdentity[]> { try { const db = svc.postgres.reader; return await getIdentitiesExistInOthers(db, tenantId, excludeMemberId, identities); } catch (err) { const error = new MemberEnrichmentError( `Failed to get existing identities for tenant ${tenantId}`, err instanceof Error ? err : undefined ); svc.logger.error(error); throw error; } }Also applies to: 28-43
Line range hint
1-71
: Address potential security concernsSeveral security considerations should be addressed:
- Add input validation for member IDs and tenant IDs to prevent injection
- Avoid exposing sensitive data in error messages
- Implement rate limiting for merge operations
- Add request validation middleware
Consider implementing these security measures at the service layer to ensure consistent protection across all operations. This could include:
- Input validation middleware
- Rate limiting middleware
- Secure error handling that sanitizes sensitive data
- Audit logging for all member operations
Would you like me to provide specific implementation examples for any of these security measures?
backend/src/database/migrations/V1731052735__llm-prompt-history.sql (1)
1-13
: Consider adding additional constraints for data integrityWhile the table structure is generally good, consider these improvements:
- Add check constraints for numeric fields to ensure valid ranges:
inputTokenCount
andoutputTokenCount
should be positiveresponseTimeSeconds
should be positive- Specify precision for the
decimal
fieldresponseTimeSeconds
- Consider adding a check constraint for valid
type
valuescreate table "llmPromptHistory" ( id bigserial primary key, - type varchar(255) not null, + type varchar(255) not null check (type in ('organization_merge_suggestion', 'member_merge_suggestion')), model text not null, "entityId" text null, metadata jsonb null, prompt text not null, answer text not null, - "inputTokenCount" int not null, - "outputTokenCount" int not null, - "responseTimeSeconds" decimal not null, + "inputTokenCount" int not null check ("inputTokenCount" > 0), + "outputTokenCount" int not null check ("outputTokenCount" > 0), + "responseTimeSeconds" decimal(10,3) not null check ("responseTimeSeconds" > 0), "createdAt" timestamptz not null default now() );services/apps/merge_suggestions_worker/src/workflows/testMergingEntitiesWithLLM.ts (2)
71-76
: Consider refactoring to reduce code duplicationThe organization merging logic closely mirrors the member merging logic. Consider extracting the common pattern into a reusable function.
+async function processEntityMerge<T>( + entities: T[], + queryType: LlmQueryType, + entityType: string +): Promise<void> { + if (entities.length !== 2) { + console.log(`Failed getting ${entityType} data in suggestion. Skipping suggestion.`); + return; + } + + const res = await commonActivitiesProxy.getLLMResult(queryType, entities); + console.log(`Raw res: `); + console.log(res.answer); + return res; +} // Then use it in both places: -const res = await commonActivitiesProxy.getLLMResult( - LlmQueryType.ORGANIZATION_MERGE, - organizations, -); -console.log(`Raw res: `); -console.log(res.answer); +const res = await processEntityMerge( + organizations, + LlmQueryType.ORGANIZATION_MERGE, + 'organization' +);
Line range hint
80-84
: Consider enhancing token counting metricsThe token counting logic could be enhanced to provide more detailed metrics:
+interface TokenMetrics { + totalTokens: number; + averageTokens: number; + maxTokens: number; + minTokens: number; + promptCount: number; +} + +function calculateTokenMetrics(tokenCounts: number[]): TokenMetrics { + const total = tokenCounts.reduce((sum, count) => sum + count, 0); + return { + totalTokens: total, + averageTokens: Math.floor(total / tokenCounts.length), + maxTokens: Math.max(...tokenCounts), + minTokens: Math.min(...tokenCounts), + promptCount: tokenCounts.length + }; +} -console.log(`Total input token count: ${totalInputTokenCount}`); -console.log( - `Average input token count per prompt: ${Math.floor(totalInputTokenCount / promptCount)}`, -); +const metrics = calculateTokenMetrics(tokenCounts); +console.log('Token usage metrics:', metrics);services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts (4)
Line range hint
12-24
: Improve error handling to preserve contextThe current error handling loses the original error context. Consider using a more descriptive error message and preserving the error chain.
try { const db = svc.postgres.reader rows = await getGithubIdentities(db, limit, afterId, afterUsername) } catch (err) { - throw new Error(err) + throw new Error(`Failed to fetch GitHub identities: ${err.message}`, { cause: err }) }
Line range hint
44-67
: Add environment variable validation and error loggingThe function accesses environment variables without validation and silently removes invalid tokens.
const cache = new RedisCache(`lfx-auth0`, svc.redis, svc.log) + const tokens = process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'] + if (!tokens) { + throw new Error('CROWD_GITHUB_PERSONAL_ACCESS_TOKENS environment variable is not set') + } const tokenRotator = new GithubTokenRotator( cache, - process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'].split(','), + tokens.split(','), ) const tokens = await tokenRotator.getAllTokens() for (const token of tokens) { try { await tokenRotator.updateRateLimitInfoFromApi(token, GithubAPIResource.CORE) } catch (e) { // something is wrong with the token, remove it from the list + svc.log.warn(`Removing invalid GitHub token: ${e.message}`) tokenRotator.removeToken(token) } }
Line range hint
69-106
: Enhance API interaction robustness and maintainabilitySeveral improvements can be made to make the code more maintainable and reliable:
- Extract API URL to a constant
- Use constants for status codes
- Consistent error handling
- Add request timeout
+const GITHUB_API_BASE_URL = 'https://api.github.com' +const RATE_LIMIT_STATUS = 429 +const API_TIMEOUT_MS = 5000 export async function findGithubSourceId(username: string): Promise<string | null> { try { const cache = new RedisCache(`lfx-auth0`, svc.redis, svc.log) const tokenRotator = new GithubTokenRotator( cache, process.env['CROWD_GITHUB_PERSONAL_ACCESS_TOKENS'].split(','), ) const token = await tokenRotator.getToken() - const response = await fetch(`https://api.github.com/users/${username}`, { + const controller = new AbortController() + const timeout = setTimeout(() => controller.abort(), API_TIMEOUT_MS) + + const response = await fetch(`${GITHUB_API_BASE_URL}/users/${username}`, { headers: { Authorization: `Bearer ${token}`, }, + signal: controller.signal, }) + clearTimeout(timeout) + if (!response.ok) { - console.error('HTTP error', response.status) - if (response.status === 429) { - throw new Error('Rate limit exceeded') + svc.log.error('GitHub API error', { status: response.status, username }) + if (response.status === RATE_LIMIT_STATUS) { + throw new Error('GitHub API rate limit exceeded') } return null }
Line range hint
1-106
: Consider architectural improvements for better maintainabilityThe file handles multiple concerns that could be better separated:
- Token management could be moved to a dedicated service
- API interactions could be abstracted into a GitHub API client class
- Configuration could be centralized using a config service
This separation would improve testability and maintenance.
Consider restructuring the code into these modules:
// githubApiClient.ts class GitHubApiClient { constructor(private tokenRotator: GithubTokenRotator) {} async getUser(username: string): Promise<IGithubUser> } // tokenService.ts class TokenService { constructor(private cache: RedisCache) {} async rotateTokens(): Promise<void> } // config.ts interface Config { github: { apiUrl: string tokens: string[] timeout: number } }services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (1)
41-42
: Consider optimizing cache lookup for single source caseThe change from single to batch lookup is good for extensibility, but might be less efficient when we know we're only looking up one source. Consider adding a fast path for single source lookups.
- const caches = await findMemberEnrichmentCache([source], input.id) - const cache = caches.find((c) => c.source === source) + const cache = sources.length === 1 + ? (await findMemberEnrichmentCache([source], input.id))[0] + : (await findMemberEnrichmentCache([source], input.id)).find((c) => c.source === source)services/libs/types/src/members.ts (1)
Line range hint
163-176
: Consider adding JSDoc comments to document the interface purpose.The addition of the
id
field is correct. However, since this interface appears to be a transformed version ofILLMConsumableMemberDbResult
, it would be helpful to document the relationship and purpose.Add documentation like this:
+/** + * Represents a cleaned/transformed version of ILLMConsumableMemberDbResult + * for LLM consumption with simplified identity fields. + */ export interface ILLMConsumableMember { id: string // ... rest of the interfaceservices/archetypes/worker/src/index.ts (1)
233-246
: Avoid hardcoded paths in worker configurationThe hardcoded paths './src/workflows' and './src/activities' might break in different environments. Consider making these configurable.
+const WORKFLOW_PATH = process.env['CROWD_WORKFLOW_PATH'] || './src/workflows' +const ACTIVITIES_PATH = process.env['CROWD_ACTIVITIES_PATH'] || './src/activities' const workflowBundle = await bundleWorkflowCode({ - workflowsPath: path.resolve('./src/workflows'), + workflowsPath: path.resolve(WORKFLOW_PATH), }) this._worker = await TemporalWorker.create({ // ... other config - activities: require(path.resolve('./src/activities')), + activities: require(path.resolve(ACTIVITIES_PATH)), // ... rest of config })services/apps/merge_suggestions_worker/src/activities/common.ts (2)
Line range hint
77-81
: Use the logging service instead ofconsole.log
inmergeMembers
In the catch block of
mergeMembers
, consider using the application's logging service (svc.log
) instead ofconsole.log
for consistent logging and better log management.Also, correct the typo in the error message.
Apply this diff to update the logging:
} catch (error) { - console.log(`Failed merging member wit status [${error.response.status}]. Skipping!`) + svc.log.error(`Failed merging member with status [${error.response.status}]. Skipping!`, error) }
Line range hint
100-104
: Use the logging service instead ofconsole.log
inmergeOrganizations
Similarly, in the
mergeOrganizations
function, replaceconsole.log
withsvc.log.error
for consistent logging.Apply this diff to update the logging:
} catch (error) { - console.log(`Failed merging organization with status [${error.response.status}]. Skipping!`) + svc.log.error(`Failed merging organization with status [${error.response.status}]. Skipping!`, error) }services/libs/common_services/src/services/llm.service.ts (1)
29-191
: Consider adding unit tests for theLlmService
methodsThe
LlmService
class is central to your application's interaction with LLMs. Adding unit tests for its methods—queryLlm
,consolidateMemberEnrichmentData
, andmergeSuggestion
—will enhance code reliability and facilitate future maintenance. Unit tests can help catch bugs early and ensure that changes do not introduce regressions.Would you like assistance in setting up unit tests for this service? I can help outline test cases or provide example implementations.
services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/memberMergeSuggestions.repo.ts (1)
262-272
: Improve query performance by indexing JSONB fields used in WHERE clausesAccessing JSONB fields in the
WHERE
clause, such aslsv.metadata ->> 'secondaryId'
, can lead to performance issues if the appropriate indexes are not in place. To optimize query performance, consider adding an index on themetadata
field for the'secondaryId'
key.Example of creating an index:
CREATE INDEX idx_llmPromptHistory_secondaryId ON "llmPromptHistory" ((metadata ->> 'secondaryId'));This index will help expedite queries filtering on
metadata ->> 'secondaryId'
.services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (6)
Line range hint
4-11
: Consider updating import paths from 'old' directoryThe functions are being imported from
@crowd/data-access-layer/src/old/apps/premium/members_enrichment_worker
. Importing from an 'old' directory suggests reliance on deprecated or outdated code. Consider updating to the latest versions or refactoring the code to use the current data access layer.
148-148
: Consider using a more descriptive variable nameThe variable
toBeSquashed
may benefit from a more descriptive name, such asnormalizedDataBySource
, to improve code readability and maintainability.
154-154
: Simplify condition using optional chainingThe condition
if (cache && cache.data)
can be simplified using optional chaining:-if (cache && cache.data) { +if (cache?.data) {🧰 Tools
🪛 Biome
[error] 154-154: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
160-174
: Remove commented-out code to improve readabilityThere is a large block of commented-out code which may reduce readability. If this code is no longer needed, consider removing it or preserving it through version control instead of commenting it out.
184-184
: Review the TODO comment regarding LLM data squasherThe TODO comment indicates that the data squasher using LLM needs implementation, but the code appears to instantiate and use
llmService.consolidateMemberEnrichmentData
. Please update or remove the TODO comment to reflect the current state of the code.
195-305
: Consider externalizing the LLM prompt for maintainabilityThe LLM prompt is extensive and embedded directly in the code. Consider moving it to a separate file or using a template system to improve readability and maintainability.
services/apps/merge_suggestions_worker/src/activities/organizationMergeSuggestions.ts (1)
Line range hint
361-373
: Refactor attribute extraction to improve readability and maintainabilityThe current implementation repeats the pattern of filtering and mapping attributes to extract values. Consider abstracting this logic into helper functions to reduce redundancy and enhance code clarity.
Here's how you could refactor the code:
First, define helper functions:
function getAttributeValue(attributes: any[], name: string): string { return attributes.find((a) => a.name === name)?.value || ''; } function getAttributeValues(attributes: any[], name: string): string[] { return attributes.filter((a) => a.name === name).map((a) => a.value); }Then, update the return statement:
return { id: base.id, displayName: base.displayName, description: base.description, - phoneNumbers: attributes.filter((a) => a.name === 'phoneNumber').map((a) => a.value), + phoneNumbers: getAttributeValues(attributes, 'phoneNumber'), logo: base.logo, tags: base.tags, location: base.location, type: base.type, - geoLocation: attributes.find((a) => a.name === 'geoLocation')?.value || '', - ticker: attributes.find((a) => a.name === 'ticker')?.value || '', - profiles: attributes.filter((a) => a.name === 'profile').map((a) => a.value), headline: base.headline, industry: base.industry, founded: base.founded, - alternativeNames: attributes.filter((a) => a.name === 'alternativeName').map((a) => a.value), + geoLocation: getAttributeValue(attributes, 'geoLocation'), + ticker: getAttributeValue(attributes, 'ticker'), + profiles: getAttributeValues(attributes, 'profile'), + alternativeNames: getAttributeValues(attributes, 'alternativeName'), identities: identities.map((i) => ({ platform: i.platform, value: i.value, })), }This refactoring reduces code duplication and makes the code easier to maintain.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (45)
backend/src/database/migrations/V1731052735__llm-prompt-history.sql
(1 hunks)backend/src/database/migrations/V1731484783__drop-member-deprecated-columns.sql
(1 hunks)services/apps/merge_suggestions_worker/package.json
(1 hunks)services/apps/merge_suggestions_worker/src/activities.ts
(1 hunks)services/apps/merge_suggestions_worker/src/activities/common.ts
(2 hunks)services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts
(2 hunks)services/apps/merge_suggestions_worker/src/activities/organizationMergeSuggestions.ts
(1 hunks)services/apps/merge_suggestions_worker/src/types.ts
(0 hunks)services/apps/merge_suggestions_worker/src/workflows/mergeMembersWithLLM.ts
(2 hunks)services/apps/merge_suggestions_worker/src/workflows/mergeOrganizationsWithLLM.ts
(2 hunks)services/apps/merge_suggestions_worker/src/workflows/testMergingEntitiesWithLLM.ts
(3 hunks)services/apps/premium/members_enrichment_worker/package.json
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
(4 hunks)services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/authenticateLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getEnrichmentLFAuth0.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/activities/syncEnrichedData.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/main.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/schedules/getMembersToEnrich.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
(3 hunks)services/archetypes/worker/src/index.ts
(2 hunks)services/libs/common_services/package.json
(1 hunks)services/libs/common_services/src/repos/llmPromptHistory.repo.ts
(1 hunks)services/libs/common_services/src/services/index.ts
(1 hunks)services/libs/common_services/src/services/llm.service.ts
(1 hunks)services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/llmSuggestionVerdicts.repo.ts
(0 hunks)services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/memberMergeSuggestions.repo.ts
(3 hunks)services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/organizationMergeSuggestions.repo.ts
(4 hunks)services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts
(4 hunks)services/libs/types/src/enums/activities.ts
(0 hunks)services/libs/types/src/enums/index.ts
(1 hunks)services/libs/types/src/enums/llm.ts
(1 hunks)services/libs/types/src/enums/merging.ts
(0 hunks)services/libs/types/src/index.ts
(1 hunks)services/libs/types/src/llm.ts
(1 hunks)services/libs/types/src/members.ts
(2 hunks)services/libs/types/src/merging.ts
(0 hunks)services/libs/types/src/organizations.ts
(1 hunks)services/libs/types/src/premium/enrichment.ts
(1 hunks)
💤 Files with no reviewable changes (5)
- services/apps/merge_suggestions_worker/src/types.ts
- services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/llmSuggestionVerdicts.repo.ts
- services/libs/types/src/enums/activities.ts
- services/libs/types/src/enums/merging.ts
- services/libs/types/src/merging.ts
✅ Files skipped from review due to trivial changes (6)
- backend/src/database/migrations/V1731484783__drop-member-deprecated-columns.sql
- services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
- services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/authenticateLFAuth0.ts
- services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getEnrichmentLFAuth0.ts
- services/apps/premium/members_enrichment_worker/src/activities/syncEnrichedData.ts
- services/apps/premium/members_enrichment_worker/src/service.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
[error] 154-154: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
services/libs/common_services/src/repos/llmPromptHistory.repo.ts
[error] 6-8: This constructor is unnecessary.
Unsafe fix: Remove the unnecessary constructor.
(lint/complexity/noUselessConstructor)
🔇 Additional comments (49)
services/libs/common_services/src/services/index.ts (1)
2-2
: LGTM! Verify the referenced service file.
The export statement follows the established barrel file pattern and correctly exposes the LLM service.
Let's verify the existence and structure of the referenced LLM service:
✅ Verification successful
Export and implementation of LlmService verified successfully
The LlmService class is properly implemented and exported:
- Extends LoggerBase with correct AWS Bedrock integration
- Implements core LLM functionality (queryLlm, consolidateMemberEnrichmentData, mergeSuggestion)
- Contains necessary type definitions and configurations
- Includes proper error handling and logging
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and structure of the LLM service file
# Check if the service file exists and show its basic structure
fd --type f "llm.service.ts" --exec cat {} \;
# Verify exports in the service file
ast-grep --pattern 'export class LlmService extends $_'
Length of output: 8321
services/libs/types/src/enums/llm.ts (1)
6-10
: LGTM! Verify the enum usage across the codebase.
The LlmQueryType
enum is well-structured with clear, consistent naming. Let's verify its integration with other components.
✅ Verification successful
Enum usage is properly integrated across the codebase
The verification shows consistent usage of LlmQueryType
enum:
- Correctly used in LLM service configuration (
services/libs/types/src/llm.ts
) - Properly integrated in merge suggestion workers for both members and organizations
- Correctly utilized in LLM service implementation (
services/libs/common_services/src/services/llm.service.ts
) - String literals only appear in the enum definition and a database migration file where they're expected
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the usage of LlmQueryType enum across the codebase
# Expected: Should find references in merge suggestion workers and member enrichment code
# Search for direct enum usage
echo "=== Direct enum usage ==="
rg "LlmQueryType\." -A 2
# Search for string literals to ensure we haven't missed any hardcoded values
echo -e "\n=== Potential hardcoded values ==="
rg "'member_merge_suggestion'|'organization_merge_suggestion'|'member_enrichment'"
Length of output: 7015
services/apps/premium/members_enrichment_worker/src/main.ts (1)
4-4
: Verify consistent service module usage across the codebase
Let's ensure that all related files are using the new service module correctly.
✅ Verification successful
Let me gather more information about the service module usage to ensure a complete verification.
Service module import and usage is consistent
The service module is properly defined in service.ts
and correctly imported in main.ts
. The exported svc
instance is consistently used across the codebase for accessing various services (redis, postgres, opensearch, etc.) and logging. No issues were found with the service module implementation or its usage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent service module imports and potential missing updates
# Search for any remaining direct config/options usage that should use the service module
rg -g '!*.{json,md}' -l "const config.*=.*Config" "services/apps/premium/members_enrichment_worker"
rg -g '!*.{json,md}' -l "const options.*=.*Options" "services/apps/premium/members_enrichment_worker"
# Verify service module imports in related files
rg -g '!*.{json,md}' "import.*from.*['\"]./service['\"]" "services/apps/premium/members_enrichment_worker"
Length of output: 404
Script:
#!/bin/bash
# Check the content of service.ts and look for any exports
cat services/apps/premium/members_enrichment_worker/src/service.ts
# Look for any usage of 'svc' in the codebase
rg "svc\." services/apps/premium/members_enrichment_worker/src/
Length of output: 8593
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/getLFIDEnrichableMembers.ts (1)
4-4
: Verify the new service import path
The import path has been changed from ../../main
to ../../service
. Let's verify this change is consistent across the codebase.
✅ Verification successful
Based on the shell script results, I can see that:
- The new
service.ts
file exists atservices/apps/premium/members_enrichment_worker/src/service.ts
- There are no remaining imports from the old path (
../../main
) - Multiple files are consistently using the new import path (
../../service
) - The
service.ts
file exports thesvc
instance that's being imported
Therefore, I can generate the final response:
Import path change is correctly implemented ✅
The change from ../../main
to ../../service
is consistent across the codebase, with the new service.ts
file properly exporting the required svc
instance. All related files have been updated to use the new import path.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the service import path change and its consistency
# Check if the new service file exists
fd --type f "service.ts" --parent-dir "services/apps/premium/members_enrichment_worker/src"
# Check for any remaining imports from the old path
rg "from '../../main'" "services/apps/premium/members_enrichment_worker/src"
Length of output: 441
Script:
#!/bin/bash
# Description: Verify the service import path change and its consistency
# Check if the new service file exists
fd "service.ts" services/apps/premium/members_enrichment_worker/src
# Check for any remaining imports from the old path
rg "from '../../main'" services/apps/premium/members_enrichment_worker/src
# Check for imports from the new path
rg "from '../../service'" services/apps/premium/members_enrichment_worker/src
# Look at the content of service.ts if it exists
fd "service.ts" services/apps/premium/members_enrichment_worker/src -x cat {}
Length of output: 41985
services/libs/types/src/enums/index.ts (1)
20-20
: LGTM! Verify module existence and breaking changes.
The new export statement follows the existing pattern and maintains alphabetical order.
Let's verify the module existence and check for any breaking changes:
✅ Verification successful
Export statement is valid and safe to use
The verification confirms:
- The
llm.ts
module exists and contains two new enums:LlmModelType
andLlmQueryType
- The old
LLMSuggestionVerdictType
enum has been completely removed frommerging.ts
- No references to
LLMSuggestionVerdictType
exist in the codebase, indicating a clean removal
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of llm.ts and check for potential breaking changes
# Check if llm.ts exists in the expected location
fd -t f "llm.ts" services/libs/types/src/enums
# Check for any remaining references to LLMSuggestionVerdictType that might be affected
rg "LLMSuggestionVerdictType" -A 5
Length of output: 118
Script:
#!/bin/bash
# Let's check the content of llm.ts and look for any references to the old enum
cat services/libs/types/src/enums/llm.ts
# Let's also check the content of merging.ts to verify the removal
fd -t f "merging.ts" services/libs/types/src/enums
cat services/libs/types/src/enums/merging.ts
# Let's search for any potential usages of LLMSuggestionVerdictType in the entire codebase
# Using -w for word boundary to ensure exact matches
rg -w "LLMSuggestionVerdictType"
Length of output: 1257
services/libs/common_services/package.json (1)
15-15
: Verify AWS SDK version compatibility and security.
Let's ensure the AWS SDK version is secure and compatible.
✅ Verification successful
AWS SDK version is secure but newer versions available
The version 3.572.0
has no known vulnerabilities, but it's several versions behind the latest (3.691.0
). Since the caret (^
) is used, it will accept compatible updates which is reasonable for AWS SDK clients as they maintain backward compatibility within major versions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for latest versions and security advisories for @aws-sdk/client-bedrock-runtime
# Check npm registry for latest versions
echo "Latest version information:"
npm view @aws-sdk/client-bedrock-runtime versions --json
# Check for known vulnerabilities
echo -e "\nVulnerability scan:"
npm audit --json @aws-sdk/[email protected]
Length of output: 2236
services/libs/types/src/index.ts (2)
63-64
: LGTM! Export statement follows consistent pattern.
The new export for the LLM module is correctly placed and follows the established pattern in the barrel file.
64-64
: Verify LLM types implementation.
Let's ensure that all necessary LLM-related types are properly defined and exported from the llm
module.
✅ Verification successful
Based on the shell script results, I can now generate the final response since I have sufficient information about the LLM types implementation.
LLM types are properly implemented and being used correctly
The LLM module exports well-defined interfaces and enums that cover:
- Core interfaces:
ILlmResponse
,ILlmResult
,ILlmSettings
,ILlmHistoryEntry
- Model configuration:
LLM_MODEL_REGION_MAP
,LLM_MODEL_PRICING_MAP
,LLM_SETTINGS
- Enums:
LlmModelType
,LlmQueryType
The types are being correctly imported and used in multiple workflow files for member and organization merging operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of LLM types and their exports
# Check the LLM types file structure and content
fd -t f "llm.ts" -X cat
# Find all files importing LLM types to understand usage patterns
rg -l "import.*from.*['\"].*types.*llm['\"]"
# Check for any potential type-related issues in files using LLM types
ast-grep --pattern 'import { $_ } from "@crowd/types"'
Length of output: 8579
services/libs/common_services/src/repos/llmPromptHistory.repo.ts (1)
1-5
: LGTM! Clean imports and proper class declaration.
The imports are well-organized and the class properly extends RepositoryBase
with correct type parameters.
services/apps/merge_suggestions_worker/src/activities.ts (2)
Line range hint 1-40
: LGTM! Clean organization of imports and exports.
The imports and exports are well-organized into logical groups (members, organizations, LLM-related functions) and maintain good symmetry.
1-1
: Verify the removal of saveLLMVerdict.
The removal of saveLLMVerdict
from imports appears to be part of the architectural changes involving the new llmPromptHistory
table. Let's verify there are no remaining references to this function.
✅ Verification successful
The removal of saveLLMVerdict
import is safe
No references to saveLLMVerdict
were found in the TypeScript codebase, confirming that its removal from the imports is appropriate and won't cause any issues.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to saveLLMVerdict that might need updating
# Search for any remaining references to saveLLMVerdict
echo "Searching for remaining references to saveLLMVerdict..."
rg "saveLLMVerdict" --type ts
# Search for any TODO or FIXME comments related to LLM verdicts
echo "Checking for related TODO/FIXME comments..."
rg "TODO.*LLM.*verdict" --type ts
rg "FIXME.*LLM.*verdict" --type ts
# Look for potential calls to this function that might need updating
echo "Checking for potential function calls..."
ast-grep --pattern 'saveLLMVerdict($_)'
Length of output: 1066
services/apps/merge_suggestions_worker/src/workflows/mergeMembersWithLLM.ts (1)
3-3
: Verify consistent usage of the new LlmQueryType across the codebase.
The migration from LLMSuggestionVerdictType
to LlmQueryType
suggests an API redesign. Let's ensure this change is consistently applied.
Also applies to: 7-7
✅ Verification successful
Based on the results, I can now generate the final response:
The migration from LLMSuggestionVerdictType
to LlmQueryType
is consistently implemented
The verification shows:
- No remaining references to the old
LLMSuggestionVerdictType
- Consistent usage of
LlmQueryType.MEMBER_MERGE
across the codebase:- Defined in types (
services/libs/types/src/llm.ts
) - Used in data access layer
- Properly integrated in LLM service
- Correctly used in merge suggestion workflows
- Defined in types (
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining usage of the old type and verify new type adoption
# Check for any remaining references to the old type
echo "Checking for remaining LLMSuggestionVerdictType references:"
rg "LLMSuggestionVerdictType"
# Verify the usage pattern of the new type
echo -e "\nVerifying LlmQueryType usage pattern:"
rg "LlmQueryType\.MEMBER_MERGE" -A 2
Length of output: 3164
services/apps/merge_suggestions_worker/package.json (1)
19-19
: LGTM! Verify workspace dependency.
The addition of @crowd/common_services
as a workspace dependency aligns with the integration of LLM services.
Let's verify the dependency exists in the workspace:
✅ Verification successful
Dependency verified and properly integrated
The @crowd/common_services
package:
- Exists as a workspace package at
services/libs/common_services
- Contains the expected LLM service implementation at
src/services/llm.service.ts
- Is properly referenced by multiple services including the merge suggestions worker
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the @crowd/common_services package exists in the workspace
# and contains the expected LLM service implementation
# Check if the package exists in the workspace
fd -t f "package.json" -x grep -l "@crowd/common_services"
# Verify LLM service implementation
fd -t f "llm.service.ts" services/libs/common_services
Length of output: 617
services/apps/premium/members_enrichment_worker/src/activities.ts (1)
9-9
: LGTM! Verify the implementation of processMemberSources
The import is properly added and maintains consistency with the module's structure.
Let's verify the implementation of the new function:
✅ Verification successful
Implementation of processMemberSources is verified and working as expected
The implementation is found in services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
and is properly integrated:
- The function is correctly implemented with proper error handling and logging
- It processes multiple enrichment sources for a member by:
- Retrieving existing enrichment caches for the member
- Normalizing and consolidating data from different sources
- Using LLM service to squash/merge data when multiple sources exist
- The function is properly imported and exported in the activities module
- It's correctly used in the enrichment workflow (
enrichMember.ts
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of processMemberSources
# Expected: Find the implementation and its test coverage
# Find the implementation
echo "Looking for processMemberSources implementation:"
ast-grep --pattern 'function processMemberSources' -A 10
# Check for test coverage
echo -e "\nLooking for corresponding tests:"
fd -e test.ts -e spec.ts | xargs rg "processMemberSources"
Length of output: 305
Script:
#!/bin/bash
# Let's try a different approach to find the implementation
# First, let's find the enrichment file that should contain the implementation
echo "Looking for the enrichment file:"
fd enrichment.ts
# Then let's examine its content
echo -e "\nChecking the content of enrichment files:"
fd enrichment.ts --exec cat {}
# Let's also try searching for any file containing processMemberSources
echo -e "\nSearching for files containing processMemberSources:"
rg -l "processMemberSources"
# And get the actual implementation with context
echo -e "\nGetting implementation context:"
rg -A 10 "processMemberSources"
Length of output: 37359
services/apps/merge_suggestions_worker/src/workflows/mergeOrganizationsWithLLM.ts (4)
Line range hint 22-63
: LGTM! Robust workflow implementation
The workflow maintains a solid structure with:
- Proper batch processing using SUGGESTIONS_PER_RUN
- Good error handling for missing organization data
- Correct usage of Temporal's continueAsNew pattern for processing large datasets
Line range hint 54-59
: Verify verdict persistence mechanism
The removal of saveLLMVerdict
suggests a change in how verdicts are persisted. Please ensure that verdict tracking/auditing is handled appropriately in the new implementation.
#!/bin/bash
# Description: Verify how verdicts are now persisted
echo "Checking for new verdict persistence mechanisms..."
rg -A 5 "saveLLMVerdict|llmPromptHistory|storeLLMResult"
echo "Checking getLLMResult implementation..."
ast-grep --pattern 'getLLMResult($_)'
49-51
: Verify LLM configuration management
The removal of hardcoded model configuration (REGION, MODEL_ID, etc.) suggests these are now managed elsewhere.
#!/bin/bash
# Description: Verify where LLM configuration is now managed
echo "Searching for LLM configuration..."
rg -A 5 "MODEL_ID|MODEL_ARGS|REGION"
echo "Checking LlmQueryType.ORGANIZATION_MERGE usage..."
ast-grep --pattern 'LlmQueryType.ORGANIZATION_MERGE'
3-3
: Verify consistent usage of the new LlmQueryType
The change from LLMSuggestionVerdictType
to LlmQueryType
suggests a broader refactoring of LLM-related types.
Also applies to: 7-7
✅ Verification successful
Type change is correctly implemented and consistent
The change from LLMSuggestionVerdictType
to LlmQueryType
is part of a broader refactoring where:
- No traces of
LLMSuggestionVerdictType
remain in the codebase LlmQueryType
is consistently used across all relevant files- The enum is properly defined with all necessary values:
MEMBER_MERGE
,ORGANIZATION_MERGE
, andMEMBER_ENRICHMENT
- The implementation in
mergeOrganizationsWithLLM.ts
correctly uses the new type
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that LLMSuggestionVerdictType is completely replaced with LlmQueryType
# and there are no remaining usages of the old type.
echo "Checking for any remaining usage of LLMSuggestionVerdictType..."
rg "LLMSuggestionVerdictType"
echo "Verifying consistent usage of new LlmQueryType..."
rg "LlmQueryType"
Length of output: 9012
Script:
#!/bin/bash
# Let's search for both types more specifically and include some context
echo "Searching for LLMSuggestionVerdictType..."
rg -A 3 "LLMSuggestionVerdictType" --type typescript
echo "Searching for LlmQueryType usage..."
rg -A 3 "LlmQueryType" --type typescript
echo "Checking type definitions..."
fd -e ts -e d.ts -x cat {} \; -E "node_modules" | rg -A 5 "export (type|enum|interface) (LLMSuggestionVerdictType|LlmQueryType)"
Length of output: 753
Script:
#!/bin/bash
# Let's try without type specification and search the entire codebase
echo "Searching for any remaining LLMSuggestionVerdictType..."
rg -A 3 "LLMSuggestionVerdictType"
echo "Checking all usages of LlmQueryType..."
rg -A 3 "LlmQueryType"
# Let's also check the specific file for the usage context
echo "Checking the specific workflow file..."
cat services/apps/merge_suggestions_worker/src/workflows/mergeOrganizationsWithLLM.ts
Length of output: 15280
services/apps/premium/members_enrichment_worker/package.json (1)
20-20
: LGTM: New dependency aligns with PR objectives.
The addition of @crowd/common_services
as a workspace dependency is appropriate for the LLM-related functionalities mentioned in the PR summary.
services/libs/types/src/llm.ts (3)
3-28
: LGTM! Well-structured interfaces with clear type definitions.
The interfaces are well-designed with appropriate use of TypeScript features:
- Good use of interface extension for
ILlmHistoryEntry
- Proper generic type parameter in
ILlmResult<T>
- Clear and specific type definitions
67-68
: Verify the high token limit for member enrichment
The token limit of 200,000 seems unusually high. Please verify if this is intentional and within the model's capabilities.
#!/bin/bash
# Search for any other token limits in the codebase for comparison
rg -g '!{dist,node_modules}' 'max_tokens'
30-33
: Verify optimal regions for LLM models
Different regions are specified for different models:
- CLAUDE_3_OPUS: us-west-2
- CLAUDE_3_5_SONNET: us-east-1
Consider the impact on latency if your application servers are in different regions.
✅ Verification successful
Let me search for more context about the LLM configuration and usage to understand the application's primary region and LLM integration better.
Let me search for the actual usage of these LLM services to understand the deployment context better.
Based on the gathered information from the executed scripts, I can now provide a final response:
Region configuration appears intentional and aligned with service architecture
The region mapping in LLM_MODEL_REGION_MAP
is used by the LLM service (llm.service.ts
) to initialize Bedrock clients per region. The different regions for different models appear to be a deliberate design choice:
- The service maintains a client region map cache
- Bedrock clients are initialized on-demand for specific regions
- The application's other AWS services (S3, SQS) are configured independently
This suggests a distributed architecture where LLM services are intentionally accessed in their respective optimal regions, separate from other application services.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any AWS region configurations or environment variables
# that might indicate the primary application region
rg -g '!{dist,node_modules}' -i 'AWS_REGION|AWS_DEFAULT_REGION'
Length of output: 1403
Script:
#!/bin/bash
# Search for LLM configuration and AWS client initialization
rg -g '!{dist,node_modules}' -B 3 -A 3 'new Bedrock|new BedrockRuntimeClient'
# Search for any configuration files or environment variables related to LLM services
rg -g '!{dist,node_modules}' -i 'BEDROCK.*REGION|LLM.*REGION'
# Look for any documentation about LLM configuration
fd README.md --exec cat {} \; | rg -i "llm|bedrock|region"
Length of output: 1208
Script:
#!/bin/bash
# Search for LLM service initialization and usage
ast-grep --pattern 'class $_ {
$$$
constructor($$$) {
$$$
LLM_MODEL_REGION_MAP
$$$
}
$$$
}'
# Search for environment configurations related to service deployment
rg -g '!{dist,node_modules}' -B 2 -A 2 'DEPLOYMENT_REGION|SERVICE_REGION|APP_REGION'
# Look for deployment configuration files
fd -e yaml -e yml --exec cat {} \; | rg -i 'region:|aws_region'
Length of output: 269
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/enrichLFAuth0.ts (1)
8-8
: Verify the service layer restructuring
The import path has been updated from ../../main
to ../../service
, which appears to be part of a service layer restructuring.
Let's verify the consistency of this change across the codebase:
✅ Verification successful
Based on the shell script results, I can generate the final response as we have sufficient information:
Import path change is consistent and valid
The verification shows:
- No remaining imports from the old
../../main
path exist - The new
service.ts
file exists at the correct location - The new import pattern is consistently used across multiple files in the same directory:
- authenticateLFAuth0.ts
- getEnrichmentLFAuth0.ts
- getLFIDEnrichableMembers.ts
- githubIdentities.ts
- enrichLFAuth0.ts
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining imports from the old path and verify the new path exists
# Check if there are any remaining imports from the old path
echo "Checking for remaining imports from old path..."
rg "from '../../main'" services/apps/premium/members_enrichment_worker/
# Verify the new service file exists
echo "Verifying new service file exists..."
fd "service.ts" services/apps/premium/members_enrichment_worker/
# Check for consistency in import patterns
echo "Checking import patterns across the codebase..."
rg "from '../../service'" services/apps/premium/members_enrichment_worker/
Length of output: 1645
services/apps/premium/members_enrichment_worker/src/schedules/getMembersToEnrich.ts (3)
Line range hint 8-39
: LGTM! The schedule configuration is well-structured
The schedule configuration for member enrichment is appropriate with:
- Daily execution at 6 AM
- Skip policy for overlapping runs
- Reasonable retry policy with exponential backoff
- Proper error handling for existing schedules
Line range hint 41-80
: Review the LFID enrichment schedule configuration changes
Several significant changes require attention:
- The schedule ID change to '-2' suggests a breaking change. Was the old schedule properly deprecated?
- The production schedule (every 2 months) seems quite infrequent compared to the dev/test schedule (every 2 minutes).
- The overlap policy has been changed to BUFFER_ONE, which could affect the workflow execution pattern.
#!/bin/bash
# Description: Check for references to the old schedule ID and verify the workflow handling
# Check for references to the old schedule ID
echo "Checking for references to old schedule ID:"
rg "members-lfid-enrichment[^-]" --type ts
# Check the workflow implementation for timing dependencies
echo -e "\nChecking workflow implementation:"
rg -A 10 "getMembersForLFIDEnrichment" --type ts
Consider the following recommendations:
- Document the rationale for the 2-month production schedule interval
- Consider adding a staging environment with an intermediate schedule (e.g., daily) to catch issues earlier
- Add monitoring and alerts for failed enrichments to compensate for the infrequent production runs
5-5
: Verify the service import path change across the codebase
The import path has been changed from '../main' to '../service'. Let's verify this change is consistent across the codebase.
✅ Verification successful
Import path change is consistent and valid
The import path change from '../main' to '../service' appears to be part of a consistent refactoring:
- No files are currently importing from '../main'
- Multiple files are already using the '../service' import
- Both service.ts and main.ts exist, confirming this is a valid refactoring
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining imports from '../main' that might need updating
# and verify the new '../service' path exists and is imported consistently
# Check for any remaining '../main' imports
echo "Checking for remaining '../main' imports:"
rg "from '\.\./main'" --type ts
# Check for consistency of '../service' imports
echo -e "\nChecking '../service' import usage:"
rg "from '\.\./service'" --type ts
Length of output: 5755
Script:
#!/bin/bash
# Let's check for both import patterns and also verify the existence of the service file
# Check for any files importing from '../main'
echo "Files importing from '../main':"
rg -l "from '\.\./main'" --type ts services/apps/premium/members_enrichment_worker/src/
# Check for files importing from '../service'
echo -e "\nFiles importing from '../service':"
rg -l "from '\.\./service'" --type ts services/apps/premium/members_enrichment_worker/src/
# Check if service.ts exists in the expected location
echo -e "\nChecking for service.ts file:"
fd "service.ts" services/apps/premium/members_enrichment_worker/src/
# Check if main.ts exists in the expected location
echo -e "\nChecking for main.ts file:"
fd "main.ts" services/apps/premium/members_enrichment_worker/src/
Length of output: 1497
backend/src/database/migrations/V1731052735__llm-prompt-history.sql (2)
15-18
: Well-designed indexing strategy!
The indexes are well-chosen and will support efficient querying:
- Composite index supports queries filtering by both type and entityId
- Individual indexes support single-column filters
- Partial index on JSON data optimizes merge suggestion queries
54-62
: Excellent safety check before table drop!
The verification logic before dropping the table is a great safety measure. It ensures:
- No data loss by checking for remaining rows
- Clear error message with actual row count
- Atomic operation within DO block
services/apps/merge_suggestions_worker/src/workflows/testMergingEntitiesWithLLM.ts (2)
3-4
: LGTM! Clean import addition
The addition of LlmQueryType
import enhances type safety for LLM query operations.
44-50
: Verify handling of removed LLM parameters
The streamlined getLLMResult
call looks good, but we should verify that the removed parameters (modelId
, prompt
, region
, modelSpecificArgs
) are properly handled internally.
✅ Verification successful
Removed parameters are properly handled through centralized settings
The verification shows that the removed parameters are now managed through the LLM_SETTINGS
configuration object, which provides type-specific settings including:
modelId
is pre-configured asCLAUDE_3_OPUS
for member merges- Model-specific arguments are defined with appropriate defaults
- Prompts are handled internally in the
mergeSuggestion
method
The streamlined API is an improvement as it enforces consistent configurations through the type system.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the getLLMResult implementation to verify parameter handling
ast-grep --pattern 'getLLMResult($type, $data) {
$$$
}'
# Check for any default configurations that might handle these parameters
rg -l "modelId|prompt|region|modelSpecificArgs" --type typescript
Length of output: 162
Script:
#!/bin/bash
# Search for getLLMResult implementation with correct file extension
ast-grep --pattern 'getLLMResult($type, $data) {
$$$
}'
# Search for configuration files and usages with correct extension
rg -l "modelId|prompt|region|modelSpecificArgs" -g '*.ts' -g '*.tsx'
# Search for the common activities interface/class
rg -A 5 "interface.*CommonActivities|class.*CommonActivities" -g '*.ts' -g '*.tsx'
# Look for LLM related configuration files
fd -e ts -e tsx -e json "llm|LLM" --exec rg -l "modelId|prompt|region|config"
Length of output: 1222
Script:
#!/bin/bash
# Check the LLM service implementation and types
rg -A 10 "getLLMResult|LlmQueryType" services/libs/common_services/src/services/llm.service.ts services/libs/types/src/llm.ts
# Look for the implementation details of member merge type
rg -A 5 "MEMBER_MERGE" services/libs/types/src/llm.ts services/libs/common_services/src/services/llm.service.ts
# Check the LLM prompt history repository for default configurations
cat services/libs/common_services/src/repos/llmPromptHistory.repo.ts
Length of output: 9840
services/apps/premium/members_enrichment_worker/src/activities/lf-auth0/githubIdentities.ts (1)
9-9
: Verify service import path consistency across the codebase
The import path has been changed from ../../main
to ../../service
. Let's ensure this change is consistent across all files.
✅ Verification successful
Import path change appears to be a deliberate service-specific pattern
The verification reveals a clear pattern:
- All files in
premium/members_enrichment_worker
consistently use../../service
- Other worker services consistently use
../../main
This suggests the import path change is a deliberate architectural decision specific to the premium members enrichment worker service, not an inconsistency that needs to be fixed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining imports from '../../main'
rg "from '\.\./\.\./main'" --type ts
# Search for the new pattern to confirm consistency
rg "from '\.\./\.\./service'" --type ts
Length of output: 2462
services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (3)
Line range hint 63-88
: LGTM! Robust handling of multiple GitHub identities
The implementation properly handles the edge case of multiple verified GitHub identities by selecting the one with the most activities. This is a good approach to resolve potential identity conflicts.
22-22
: Verify the implementation of the new processMemberSources activity
The addition of this new activity appears to replace the previous normalization step. Let's ensure it's properly implemented.
104-104
: Verify data consistency after removing normalization
The removal of the normalization step in favor of direct processing with processMemberSources
simplifies the flow. However, we should ensure that any necessary data transformations are handled within the new activity.
✅ Verification successful
Let me gather more specific information about the data transformations and normalization logic.
Let me check the actual implementation of processMemberSources and any data transformation logic within it.
Let me check the implementation of processMemberSources and normalizeEnrichmentData to understand how the data transformation is handled.
Based on the code analysis, I can now provide a final response about the data consistency concern:
Data consistency is maintained through source-specific normalization
The removal of the normalization step in favor of direct processMemberSources
is safe because:
- Each enrichment source (SERP, ProgAI, Clearbit, etc.) implements its own
normalize
method that handles data transformation - The
normalizeEnrichmentData
function is still called withinprocessMemberSources
to transform source-specific data into a standardizedIMemberEnrichmentDataNormalized
format - Common normalization utilities like
normalizeAttributes
andnormalizeSocialIdentity
are used consistently across sources
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any existing normalization logic that might need to be preserved
echo "Searching for existing normalization patterns..."
rg -l "normalize.*Data"
# Search for data transformation in the new process
echo "Searching for data transformation in processMemberSources..."
ast-grep --pattern 'function processMemberSources($_, $_) {
$$$
}'
Length of output: 955
Script:
#!/bin/bash
# Check the implementation of processMemberSources
echo "Checking processMemberSources implementation..."
rg -A 10 "processMemberSources" services/apps/premium/members_enrichment_worker/src/
# Check the removed normalization code from git history
echo "Checking removed normalization code..."
git log -p | rg -A 10 -B 10 "normalization"
# Check the actual normalization implementations
echo "Checking normalization implementations..."
rg -A 10 "normalizeData" services/apps/premium/members_enrichment_worker/src/
Length of output: 29661
Script:
#!/bin/bash
# Check the implementation details of processMemberSources
echo "Checking processMemberSources implementation details..."
rg -B 5 -A 20 "export async function processMemberSources" services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
# Check for any data transformation logic in the sources
echo "Checking source-specific transformations..."
rg -A 10 "transform|normalize" services/apps/premium/members_enrichment_worker/src/sources/
Length of output: 62287
Script:
#!/bin/bash
# Check the full implementation of processMemberSources and normalizeEnrichmentData
echo "Checking processMemberSources and normalizeEnrichmentData implementations..."
rg -B 5 -A 30 "export async function processMemberSources|export async function normalizeEnrichmentData" services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
# Check for any data transformation in the enrichment activity
echo "Checking enrichment activity transformations..."
rg -A 10 "transform|normalize" services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 4914
services/libs/types/src/members.ts (1)
Line range hint 147-159
: LGTM! The addition of the id field enhances type safety.
The id
field is correctly placed and typed, ensuring proper identification of member entities in the database results.
services/libs/types/src/organizations.ts (1)
169-169
: LGTM! Addition of required id
field aligns with interface design patterns.
The addition of the required id
field to ILLMConsumableOrganization
brings consistency with other organization interfaces in the codebase and ensures proper organization tracking in LLM operations.
services/archetypes/worker/src/index.ts (2)
103-103
: LGTM: Good separation of concerns
The optional initWorker
parameter with a default value of true
maintains backward compatibility while providing flexibility to initialize connections without starting the worker.
206-252
: LGTM: Clean conditional initialization
The worker initialization is properly encapsulated in a conditional block with consistent error handling patterns.
services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/organizationMergeSuggestions.repo.ts (3)
Line range hint 208-214
: Documentation accurately reflects architectural changes
The updated documentation clearly explains the transition to using the llmPromptHistory
table while maintaining clarity about the method's functionality.
5-5
: Verify type replacement across the codebase
The change from LLMSuggestionVerdictType
to LlmQueryType
appears to be part of a broader refactoring. Let's verify all usages have been updated consistently.
✅ Verification successful
Type replacement has been consistently updated across the codebase
The verification shows that:
- No references to the old
LLMSuggestionVerdictType
exist in the codebase LlmQueryType
is properly defined inservices/libs/types/src/enums/llm.ts
- All usages of
LlmQueryType
are consistent across the codebase, including:- Merge suggestion workers
- Data access layer repositories
- Common services
- Type definitions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to the old type
rg "LLMSuggestionVerdictType"
# Search for new type usage to ensure consistent implementation
rg "LlmQueryType"
Length of output: 3741
257-267
: Consider performance optimization for JSON field access
The queries now use JSON extraction (metadata ->> 'secondaryId'
) in WHERE clauses, which might impact performance. Consider:
- Adding an index on the JSON field if not already present
- Alternatively, consider storing secondaryId as a separate column if it's frequently queried
If you'd like to improve query performance, here's a suggested index:
CREATE INDEX idx_llm_prompt_history_metadata_secondary_id
ON "llmPromptHistory" ((metadata->>'secondaryId'));
Let's verify if this index exists:
Also applies to: 277-287
services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts (2)
310-310
: LGTM: Addition of primary member ID field
The addition of the id
field for the primary member aligns with the interface requirements and provides necessary identification for LLM processing.
327-327
: LGTM: Addition of secondary member ID field
The addition of the id
field for the secondary member maintains consistency with the primary member structure and completes the interface requirements.
Let's verify the interface compliance:
✅ Verification successful
The verification results show that:
- The
ILLMConsumableMember
interface inservices/libs/types/src/members.ts
explicitly requires theid
field as a string. - The implementation for the primary member includes all required fields including
id
.
However, let me verify the secondary member implementation to ensure complete interface compliance:
Let me try a different pattern to locate the secondary member implementation:
LGTM: Secondary member object correctly implements ILLMConsumableMember interface
The addition of id: secondaryMember.id
is correct and the secondary member object fully implements the ILLMConsumableMember
interface, matching the structure of the primary member. The implementation includes all required fields: id, displayName, joinedAt, attributes, identities, and organizations with their respective properties.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that ILLMConsumableMember interface requires the id field
# and that all implementations include it
# Search for the interface definition and its usage
ast-grep --pattern 'interface ILLMConsumableMember {
$$$
}'
# Search for all object constructions matching the interface pattern
ast-grep --pattern '{
id: $_,
displayName: $_,
joinedAt: $_,
$$$
}'
Length of output: 2787
Script:
#!/bin/bash
# Search for the secondary member object construction
ast-grep --pattern '{
id: secondaryMember.id,
displayName: secondaryMember.$_,
joinedAt: secondaryMember.$_,
$$$
}'
Length of output: 125
Script:
#!/bin/bash
# Search for the secondary member object construction with a more flexible pattern
rg -A10 "secondaryMember\.id" services/apps/merge_suggestions_worker/src/activities/memberMergeSuggestions.ts
Length of output: 608
services/apps/merge_suggestions_worker/src/activities/common.ts (1)
44-46
: Verify all calls to getLLMResult
are updated
The function signature of getLLMResult
has changed to include the type
parameter of LlmQueryType
. Ensure that all calls to getLLMResult
in the codebase are updated to match the new signature.
Run the following script to find usages of getLLMResult
:
✅ Verification successful
Based on the rg
search results, I can generate the final response as I have enough information to verify the concern.
All calls to getLLMResult
are properly updated ✅
All 4 calls to getLLMResult
in the codebase are already passing the correct type
parameter (LlmQueryType.MEMBER_MERGE
or LlmQueryType.ORGANIZATION_MERGE
) as the first argument, matching the new function signature.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all usages of 'getLLMResult' in the codebase.
rg 'getLLMResult\(' -A 2
Length of output: 1966
services/libs/data-access-layer/src/old/apps/merge_suggestions_worker/memberMergeSuggestions.repo.ts (2)
6-6
: Importing LlmQueryType
is appropriate for updated query handling
The addition of LlmQueryType
in the import statements correctly reflects the updated types used for query operations in this file.
262-272
:
Ensure safe casting of JSONB field to UUID to prevent runtime errors
When casting (lsv.metadata ->> 'secondaryId')::uuid
, it assumes that the metadata
JSONB field always contains the 'secondaryId'
key with a valid UUID value. If this key is missing or the value is not a valid UUID, the cast will fail and result in a runtime error.
Run the following script to check for entries where secondaryId
is missing or invalid:
Consider modifying the query to safely handle cases where secondaryId
might be missing or invalid:
- (lsv.metadata ->> 'secondaryId')::uuid = mtmr."toMergeId" and
+ (lsv.metadata ->> 'secondaryId') IS NOT NULL AND
+ lsv.metadata ->> 'secondaryId' = mtmr."toMergeId"::text and
This change avoids casting to uuid
and compares the values as text after ensuring secondaryId
is not NULL
.
services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (3)
16-68
: Function fetchMemberDataForLLMSquashing
is properly implemented
The newly added function fetchMemberDataForLLMSquashing
correctly retrieves the member data along with associated organizations and verified identities. The SQL query is well-structured, uses parameter binding to prevent SQL injection, and aggregates the data as expected.
527-529
: Function resetMemberEnrichedAt
is correctly added
The resetMemberEnrichedAt
function effectively resets the lastEnriched
field to null
for a given member, which is essential for triggering re-enrichment.
582-595
:
Update all callers of findMemberEnrichmentCacheDb
due to signature change
The function findMemberEnrichmentCacheDb
now accepts an array of MemberEnrichmentSource
as the sources
parameter instead of a single source. Ensure that all function calls throughout the codebase are updated to pass an array of sources to prevent runtime errors.
Run the following script to locate and review all calls to findMemberEnrichmentCacheDb
:
public async insertPromptHistoryEntry( | ||
type: LlmQueryType, | ||
model: LlmModelType, | ||
result: ILlmResponse, | ||
entityId?: string, | ||
metadata?: Record<string, unknown>, | ||
): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for required fields.
The method should validate that all required fields in the ILlmResponse
object are present before proceeding with the insert operation.
public async insertPromptHistoryEntry(
type: LlmQueryType,
model: LlmModelType,
result: ILlmResponse,
entityId?: string,
metadata?: Record<string, unknown>,
): Promise<void> {
+ const requiredFields = ['prompt', 'answer', 'inputTokenCount', 'outputTokenCount', 'responseTimeSeconds'];
+ const missingFields = requiredFields.filter(field => !(field in result));
+ if (missingFields.length > 0) {
+ throw new Error(`Missing required fields in ILlmResponse: ${missingFields.join(', ')}`);
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public async insertPromptHistoryEntry( | |
type: LlmQueryType, | |
model: LlmModelType, | |
result: ILlmResponse, | |
entityId?: string, | |
metadata?: Record<string, unknown>, | |
): Promise<void> { | |
public async insertPromptHistoryEntry( | |
type: LlmQueryType, | |
model: LlmModelType, | |
result: ILlmResponse, | |
entityId?: string, | |
metadata?: Record<string, unknown>, | |
): Promise<void> { | |
const requiredFields = ['prompt', 'answer', 'inputTokenCount', 'outputTokenCount', 'responseTimeSeconds']; | |
const missingFields = requiredFields.filter(field => !(field in result)); | |
if (missingFields.length > 0) { | |
throw new Error(`Missing required fields in ILlmResponse: ${missingFields.join(', ')}`); | |
} |
await this.db().none( | ||
` | ||
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | ||
values($(type), $(model), $(entityId), $(metadata), $(prompt), $(answer), $(inputTokenCount), $(outputTokenCount), $(responseTimeSeconds)); | ||
`, | ||
{ | ||
type, | ||
model, | ||
entityId, | ||
metadata: metadata ? JSON.stringify(metadata) : null, | ||
...result, | ||
}, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling with specific error types.
The database operation should be wrapped in a try-catch block with appropriate error handling.
+ try {
await this.db().none(
`
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds")
values($(type), $(model), $(entityId), $(metadata), $(prompt), $(answer), $(inputTokenCount), $(outputTokenCount), $(responseTimeSeconds));
`,
{
type,
model,
entityId,
metadata: metadata ? JSON.stringify(metadata) : null,
...result,
},
)
+ } catch (error) {
+ this.log.error('Failed to insert LLM prompt history', { error, type, model, entityId });
+ throw new Error('Failed to insert LLM prompt history');
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
await this.db().none( | |
` | |
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
values($(type), $(model), $(entityId), $(metadata), $(prompt), $(answer), $(inputTokenCount), $(outputTokenCount), $(responseTimeSeconds)); | |
`, | |
{ | |
type, | |
model, | |
entityId, | |
metadata: metadata ? JSON.stringify(metadata) : null, | |
...result, | |
}, | |
) | |
} | |
try { | |
await this.db().none( | |
` | |
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
values($(type), $(model), $(entityId), $(metadata), $(prompt), $(answer), $(inputTokenCount), $(outputTokenCount), $(responseTimeSeconds)); | |
`, | |
{ | |
type, | |
model, | |
entityId, | |
metadata: metadata ? JSON.stringify(metadata) : null, | |
...result, | |
}, | |
) | |
} catch (error) { | |
this.log.error('Failed to insert LLM prompt history', { error, type, model, entityId }); | |
throw new Error('Failed to insert LLM prompt history'); | |
} | |
} |
@@ -6,6 +6,7 @@ | |||
"start:debug": "CROWD_TEMPORAL_TASKQUEUE=members-enrichment SERVICE=members-enrichment-worker LOG_LEVEL=trace tsx --inspect=0.0.0.0:9232 src/main.ts", | |||
"dev:local": "nodemon --watch src --watch ../../../libs --ext ts --exec pnpm run start:debug:local", | |||
"dev": "nodemon --watch src --watch ../../../libs --ext ts --exec pnpm run start:debug", | |||
"script:onboarding": "SERVICE=script tsx --inspect src/bin/onboarding.ts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Script path does not exist and needs to be created
The src/bin/onboarding.ts
path referenced in the package.json script does not exist in the codebase. The script cannot be executed until:
- The
bin
directory is created undersrc
- The
onboarding.ts
script is implemented
Additionally:
- Remove the
--inspect
flag from the production script - Consider adding a corresponding test script
🔗 Analysis chain
Verify the onboarding script path and consider improvements.
The new script uses the --inspect flag which enables debugging. Consider:
- Adding a non-debug version for production use
- Adding a corresponding test script
Let's verify the script path exists:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the onboarding script path exists
# Expected: The file should exist at the specified path
fd "onboarding.ts" "services/apps/premium/members_enrichment_worker/src/bin"
Length of output: 217
Script:
#!/bin/bash
# Let's first check if the directory structure exists
fd "bin" "services/apps/premium/members_enrichment_worker"
# Also search for onboarding.ts anywhere in the service directory
fd "onboarding.ts" "services/apps/premium/members_enrichment_worker"
# Check if there are any recent changes related to onboarding.ts
git diff HEAD~1 HEAD -- "services/apps/premium/members_enrichment_worker"
Length of output: 6863
Script:
#!/bin/bash
# Let's check if there's a src directory and its structure
ls -la services/apps/premium/members_enrichment_worker/src
# Check if there are any TypeScript files in the root src directory
# that might be related to onboarding
fd -e ts . "services/apps/premium/members_enrichment_worker/src"
Length of output: 3834
services/libs/types/src/llm.ts
Outdated
export const LLM_SETTINGS: Record<LlmQueryType, ILlmSettings> = { | ||
[LlmQueryType.MEMBER_MERGE]: { | ||
modelId: LlmModelType.CLAUDE_3_OPUS, | ||
arguments: { | ||
max_tokens: 2000, | ||
anthropic_version: 'bedrock-2023-05-31', | ||
temperature: 0, | ||
}, | ||
}, | ||
[LlmQueryType.ORGANIZATION_MERGE]: { | ||
modelId: LlmModelType.CLAUDE_3_OPUS, | ||
arguments: { | ||
max_tokens: 2000, | ||
anthropic_version: 'bedrock-2023-05-31', | ||
temperature: 0, | ||
}, | ||
}, | ||
[LlmQueryType.MEMBER_ENRICHMENT]: { | ||
modelId: LlmModelType.CLAUDE_3_5_SONNET, | ||
arguments: { | ||
// TODO uros check if this is ok | ||
max_tokens: 200000, | ||
anthropic_version: 'bedrock-2023-05-31', | ||
temperature: 0, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor duplicate configuration
The merge operations (MEMBER_MERGE and ORGANIZATION_MERGE) share identical configuration. Consider extracting this to reduce duplication.
export const LLM_SETTINGS: Record<LlmQueryType, ILlmSettings> = {
+ const mergeConfig = {
+ modelId: LlmModelType.CLAUDE_3_OPUS,
+ arguments: {
+ max_tokens: 2000,
+ anthropic_version: 'bedrock-2023-05-31',
+ temperature: 0,
+ },
+ };
+
- [LlmQueryType.MEMBER_MERGE]: {
- modelId: LlmModelType.CLAUDE_3_OPUS,
- arguments: {
- max_tokens: 2000,
- anthropic_version: 'bedrock-2023-05-31',
- temperature: 0,
- },
- },
- [LlmQueryType.ORGANIZATION_MERGE]: {
- modelId: LlmModelType.CLAUDE_3_OPUS,
- arguments: {
- max_tokens: 2000,
- anthropic_version: 'bedrock-2023-05-31',
- temperature: 0,
- },
- },
+ [LlmQueryType.MEMBER_MERGE]: mergeConfig,
+ [LlmQueryType.ORGANIZATION_MERGE]: mergeConfig,
Committable suggestion skipped: line range outside the PR's diff.
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | ||
select 'organization_merge_suggestion', | ||
model, | ||
"primaryId", | ||
json_build_object( | ||
'secondaryId', "secondaryId" | ||
), | ||
prompt, | ||
verdict, | ||
"inputTokenCount", | ||
"outputTokenCount", | ||
"responseTimeSeconds" | ||
from "llmSuggestionVerdicts" | ||
where type = 'organization'; | ||
|
||
delete from "llmSuggestionVerdicts" where type = 'organization'; | ||
|
||
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | ||
select 'member_merge_suggestion', | ||
model, | ||
"primaryId", | ||
json_build_object( | ||
'secondaryId', "secondaryId" | ||
), | ||
prompt, | ||
verdict, | ||
"inputTokenCount", | ||
"outputTokenCount", | ||
"responseTimeSeconds" | ||
from "llmSuggestionVerdicts" | ||
where type = 'member'; | ||
|
||
delete from "llmSuggestionVerdicts" where type = 'member'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add transaction safety to prevent data loss
The current migration lacks transaction safety. If the script fails between insert and delete operations, we might lose data or end up in an inconsistent state.
Wrap the operations in transactions and add validation:
+ begin;
+
+ -- Store counts for validation
+ do $$
+ declare
+ org_count int;
+ member_count int;
+ begin
+ select count(*) into org_count from "llmSuggestionVerdicts" where type = 'organization';
+ select count(*) into member_count from "llmSuggestionVerdicts" where type = 'member';
+
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds")
-- ... (organization insert query remains same) ...
where type = 'organization';
+ -- Validate organization migration
+ if (select count(*) from "llmPromptHistory" where type = 'organization_merge_suggestion') != org_count then
+ raise exception 'Organization data migration count mismatch';
+ end if;
delete from "llmSuggestionVerdicts" where type = 'organization';
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds")
-- ... (member insert query remains same) ...
where type = 'member';
+ -- Validate member migration
+ if (select count(*) from "llmPromptHistory" where type = 'member_merge_suggestion') != member_count then
+ raise exception 'Member data migration count mismatch';
+ end if;
delete from "llmSuggestionVerdicts" where type = 'member';
+ end $$;
+
+ commit;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
select 'organization_merge_suggestion', | |
model, | |
"primaryId", | |
json_build_object( | |
'secondaryId', "secondaryId" | |
), | |
prompt, | |
verdict, | |
"inputTokenCount", | |
"outputTokenCount", | |
"responseTimeSeconds" | |
from "llmSuggestionVerdicts" | |
where type = 'organization'; | |
delete from "llmSuggestionVerdicts" where type = 'organization'; | |
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
select 'member_merge_suggestion', | |
model, | |
"primaryId", | |
json_build_object( | |
'secondaryId', "secondaryId" | |
), | |
prompt, | |
verdict, | |
"inputTokenCount", | |
"outputTokenCount", | |
"responseTimeSeconds" | |
from "llmSuggestionVerdicts" | |
where type = 'member'; | |
delete from "llmSuggestionVerdicts" where type = 'member'; | |
begin; | |
-- Store counts for validation | |
do $$ | |
declare | |
org_count int; | |
member_count int; | |
begin | |
select count(*) into org_count from "llmSuggestionVerdicts" where type = 'organization'; | |
select count(*) into member_count from "llmSuggestionVerdicts" where type = 'member'; | |
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
select 'organization_merge_suggestion', | |
model, | |
"primaryId", | |
json_build_object( | |
'secondaryId', "secondaryId" | |
), | |
prompt, | |
verdict, | |
"inputTokenCount", | |
"outputTokenCount", | |
"responseTimeSeconds" | |
from "llmSuggestionVerdicts" | |
where type = 'organization'; | |
-- Validate organization migration | |
if (select count(*) from "llmPromptHistory" where type = 'organization_merge_suggestion') != org_count then | |
raise exception 'Organization data migration count mismatch'; | |
end if; | |
delete from "llmSuggestionVerdicts" where type = 'organization'; | |
insert into "llmPromptHistory"(type, model, "entityId", metadata, prompt, answer, "inputTokenCount", "outputTokenCount", "responseTimeSeconds") | |
select 'member_merge_suggestion', | |
model, | |
"primaryId", | |
json_build_object( | |
'secondaryId', "secondaryId" | |
), | |
prompt, | |
verdict, | |
"inputTokenCount", | |
"outputTokenCount", | |
"responseTimeSeconds" | |
from "llmSuggestionVerdicts" | |
where type = 'member'; | |
-- Validate member migration | |
if (select count(*) from "llmPromptHistory" where type = 'member_merge_suggestion') != member_count then | |
raise exception 'Member data migration count mismatch'; | |
end if; | |
delete from "llmSuggestionVerdicts" where type = 'member'; | |
end $$; | |
commit; |
@@ -212,6 +212,7 @@ class MemberMergeSuggestionsRepository { | |||
const result: ILLMConsumableMemberDbResult[] = await this.connection.manyOrNone( | |||
` | |||
select | |||
mem.id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include mem.id
in the GROUP BY
clause to prevent SQL errors
The mem.id
column has been added to the SELECT
statement but is not included in the GROUP BY
clause. In PostgreSQL, when using aggregation functions like jsonb_agg
, all selected columns that are not aggregated must be included in the GROUP BY
clause to avoid SQL errors.
Apply this diff to include mem.id
in the GROUP BY
clause:
group by
+ mem.id,
mI."memberId",
mem.attributes,
mem."displayName",
mem."joinedAt"
Committable suggestion skipped: line range outside the PR's diff.
const result = await llmService.consolidateMemberEnrichmentData(memberId, prompt) | ||
svc.log.info({ memberId }, 'LLM result') | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for LLM service call
Currently, there is no error handling for the call to llmService.consolidateMemberEnrichmentData
. If the LLM service fails or returns invalid data, it could cause unintended behavior. Consider adding try-catch blocks and appropriate error handling.
` | ||
|
||
const result = await llmService.consolidateMemberEnrichmentData(memberId, prompt) | ||
svc.log.info({ memberId }, 'LLM result') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging potentially sensitive data
The log statement svc.log.info({ memberId }, 'LLM result')
might log sensitive data if the LLM result includes personal information. Ensure that sensitive data is not exposed in logs.
const res = await tx.query( | ||
`UPDATE "memberEnrichmentCache" | ||
SET | ||
"updatedAt" = NOW(), | ||
"data" = $2 | ||
WHERE "memberId" = $1 and source = $3;`, | ||
[memberId, dataSanitized, source], | ||
) | ||
await resetMemberEnrichedAt(tx, memberId) | ||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure error handling in updateMemberEnrichmentCacheDb
Similarly, in updateMemberEnrichmentCacheDb
, ensure that resetMemberEnrichedAt
is called only if the update operation affects rows. This adds robustness to your function by preventing unintended state changes.
Apply this diff to enhance error handling:
const res = await tx.query(
`UPDATE "memberEnrichmentCache"
SET
"updatedAt" = NOW(),
"data" = $2
WHERE "memberId" = $1 and source = $3;`,
[memberId, dataSanitized, source],
)
+if (res.rowCount > 0) {
+ await resetMemberEnrichedAt(tx, memberId)
+}
return res
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const res = await tx.query( | |
`UPDATE "memberEnrichmentCache" | |
SET | |
"updatedAt" = NOW(), | |
"data" = $2 | |
WHERE "memberId" = $1 and source = $3;`, | |
[memberId, dataSanitized, source], | |
) | |
await resetMemberEnrichedAt(tx, memberId) | |
return res | |
const res = await tx.query( | |
`UPDATE "memberEnrichmentCache" | |
SET | |
"updatedAt" = NOW(), | |
"data" = $2 | |
WHERE "memberId" = $1 and source = $3;`, | |
[memberId, dataSanitized, source], | |
) | |
if (res.rowCount > 0) { | |
await resetMemberEnrichedAt(tx, memberId) | |
} | |
return res |
const res = await tx.query( | ||
`INSERT INTO "memberEnrichmentCache" ("memberId", "data", "createdAt", "updatedAt", "source") | ||
VALUES ($1, $2, NOW(), NOW(), $3);`, | ||
[memberId, dataSanitized, source], | ||
) | ||
await resetMemberEnrichedAt(tx, memberId) | ||
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure error handling in insertMemberEnrichmentCacheDb
In insertMemberEnrichmentCacheDb
, consider adding error handling to ensure that resetMemberEnrichedAt
is only called if the insertion into "memberEnrichmentCache"
is successful. This prevents resetting the lastEnriched
timestamp when the cache update fails.
Apply this diff to enhance error handling:
const res = await tx.query(
`INSERT INTO "memberEnrichmentCache" ("memberId", "data", "createdAt", "updatedAt", "source")
VALUES ($1, $2, NOW(), NOW(), $3);`,
[memberId, dataSanitized, source],
)
+if (res.rowCount > 0) {
+ await resetMemberEnrichedAt(tx, memberId)
+}
return res
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const res = await tx.query( | |
`INSERT INTO "memberEnrichmentCache" ("memberId", "data", "createdAt", "updatedAt", "source") | |
VALUES ($1, $2, NOW(), NOW(), $3);`, | |
[memberId, dataSanitized, source], | |
) | |
await resetMemberEnrichedAt(tx, memberId) | |
return res | |
const res = await tx.query( | |
`INSERT INTO "memberEnrichmentCache" ("memberId", "data", "createdAt", "updatedAt", "source") | |
VALUES ($1, $2, NOW(), NOW(), $3);`, | |
[memberId, dataSanitized, source], | |
) | |
if (res.rowCount > 0) { | |
await resetMemberEnrichedAt(tx, memberId) | |
} | |
return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (19)
backend/src/database/migrations/V1731052735__llm-prompt-history.sql (1)
15-17
: Optimize index strategy
- The single indexes might be redundant as the composite index
ix_llmPromptHistory_type_entityId
can be used for queries on either column- Consider adding
NULLS LAST
for better null handling in the indexescreate index "ix_llmPromptHistory_type_entityId" on "llmPromptHistory"("type", "entityId" NULLS LAST); -create index "ix_llmPromptHistory_entityId" on "llmPromptHistory"("entityId"); -create index "ix_llmPromptHistory_type" on "llmPromptHistory"("type");services/libs/types/src/llm.ts (3)
35-45
: Enhance pricing documentationThe comment "to estimate costs - these numbers can change" could be more informative. Consider:
- Adding a link to the official pricing documentation
- Including the last update date
- Documenting how to calculate total cost using these rates
Example documentation:
// Pricing as of April 2024 (see https://aws.amazon.com/bedrock/pricing/) // Total cost calculation: // Input cost = (input_tokens / 1000) * costPer1000InputTokens // Output cost = (output_tokens / 1000) * costPer1000OutputTokens // Total cost = Input cost + Output cost
91-112
: Add JSDoc comments for the enrichment result interfaceThe
LlmMemberEnrichmentResult
interface would benefit from documentation explaining:
- The meaning of the confidence score range
- The difference between 'update' and 'new' arrays
- The purpose of each nested structure
Example documentation:
/** * Represents the result of an LLM-based member enrichment operation * @property confidence - Score between 0-1 indicating the LLM's confidence in the enrichment * @property changes - Structured changes to be applied to the member * @property changes.identities - New identities to add and existing ones to update * @property changes.attributes - New attributes to add and existing ones to update * @property changes.organizations - New organizations and connections to be created */
30-56
: Consider environment-specific configuration managementThe hardcoded region map and model settings might need to vary across different environments (development, staging, production). Consider:
- Moving these configurations to environment variables
- Creating a configuration service
- Adding validation for required environment variables
services/libs/common_services/src/services/llm.service.ts (1)
28-62
: Consider implementing cleanup for AWS clientsThe client caching strategy is good for performance, but consider implementing a cleanup method to properly dispose of AWS clients when they're no longer needed. This prevents potential resource leaks, especially in long-running applications.
export class LlmService extends LoggerBase { private readonly repo: LlmPromptHistoryRepository private readonly clientRegionMap: Map<string, BedrockRuntimeClient> + + public async dispose(): Promise<void> { + for (const client of this.clientRegionMap.values()) { + await client.destroy() + } + this.clientRegionMap.clear() + }services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (1)
535-537
: Add error handling toresetMemberEnrichedAt
The function should handle the case where the update affects no rows, as this might indicate a potential issue.
Apply this diff to add error handling:
export async function resetMemberEnrichedAt(tx: DbConnOrTx, memberId: string): Promise<void> { - await tx.none(`update members set "lastEnriched" = null where id = $(memberId);`, { memberId }) + const result = await tx.result(`update members set "lastEnriched" = null where id = $(memberId);`, { memberId }) + if (result.rowCount === 0) { + console.warn(`No member found with id ${memberId} when resetting lastEnriched`) + } }services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (1)
219-341
: Consider extracting the LLM prompt templateThe LLM prompt is quite long and contains complex validation rules. Consider extracting it to a separate template file for better maintainability.
Create a new file
llm-templates/member-enrichment.ts
:export const getMemberEnrichmentPrompt = ( existingMemberData: any, enrichedData: any, memberIdentityTypes: string[], organizationIdentityTypes: string[], organizationSources: string[], ) => ` You are a data consolidation expert... // Rest of the prompt template `Then update the code to use the template:
+ import { getMemberEnrichmentPrompt } from './llm-templates/member-enrichment' - const prompt = `...` + const prompt = getMemberEnrichmentPrompt( + existingMemberData, + toBeSquashed, + Object.values(MemberIdentityType), + Object.values(OrganizationIdentityType), + Object.values(OrganizationSource), + )services/apps/data_sink_worker/src/service/organization.service.ts (4)
Line range hint
85-85
: Consider reusing the existingOrganizationService
instance instead of creating a new oneAt line 85, a new instance of
OrganizationService
is created within theprocessOrganizationEnrich
method:const txService = new OrganizationService(txStore, this.log)Since
OrganizationService
is already instantiated, consider reusing the existing instance by passing the transactional store to the required methods or refactoring to support transactions without creating a new instance. This would improve maintainability and reduce unnecessary instantiation.
Line range hint
116-125
: Avoid mutating input parameters directlyIn the loop starting at line 116, the
organization.identities
array is being modified by pushing existing identities:for (const i of existingIdentities) { if ( organization.identities.find( (oi) => oi.type === i.type && oi.value === i.value && oi.platform === i.platform && oi.verified === i.verified, ) ) { continue } organization.identities.push(i) }Mutating the
organization
object directly can lead to unintended side effects, especially iforganization
is used elsewhere. Consider creating a new array or cloning the object before making modifications to prevent side effects.Apply this diff to avoid mutating the input parameter:
- organization.identities.push(i) + const updatedIdentities = [...organization.identities, i]And use
updatedIdentities
in place oforganization.identities
in subsequent code.
Line range hint
95-95
: Ensure the primary identity selection logic is robustAt line 95, the primary identity is selected as the first verified identity:
const primaryIdentity = verifiedIdentities[0]Ensure that this logic aligns with business requirements. If there are multiple verified identities, selecting the first one may not always be appropriate. Consider implementing logic to determine the primary identity based on specific criteria (e.g., priority of identity types, recency, etc.).
Line range hint
108-109
: Consider handling the case when no organization is found more explicitlyWhen no existing organization is found, the method logs a debug message and returns:
this.log.debug( 'No organization found for enriching. This organization enrich process had no affect.', )If the absence of an organization to enrich is expected behavior, consider whether additional actions are required. If it's an unexpected scenario, you may want to log a warning or handle it accordingly.
services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts (5)
19-21
: Enhance error messaging for missing argumentsCurrently, if the required
tenantId
argument is missing, the script exits without any explanation. Consider adding an error message to inform the user about the correct usage.Apply this diff to add informative error messaging:
if (processArguments.length !== 1) { + console.error('Error: Exactly one argument (tenantId) is required') + console.error('Usage: node onboarding.ts <tenantId>') process.exit(1) }
23-23
: ValidatetenantId
formatTo prevent potential issues with invalid
tenantId
values, consider adding a validation check to ensure it conforms to the expected UUID v4 format.Apply this diff to validate the
tenantId
:const tenantId = processArguments[0] +if (!tenantId.match(/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i)) { + console.error('Error: Invalid tenantId format. Expected UUID v4') + process.exit(1) +}
25-26
: Make configuration parameters adjustableThe
minMemberActivities
andmaxConcurrentProcessing
values are currently hard-coded. Consider making these parameters configurable via environment variables or command-line arguments to enhance flexibility.Apply this diff to make the parameters configurable:
-const minMemberActivities = 100 -const maxConcurrentProcessing = 5 +const minMemberActivities = process.env.MIN_MEMBER_ACTIVITIES + ? parseInt(process.env.MIN_MEMBER_ACTIVITIES, 10) + : 100 +const maxConcurrentProcessing = process.env.MAX_CONCURRENT_PROCESSING + ? parseInt(process.env.MAX_CONCURRENT_PROCESSING, 10) + : 5You may also update your documentation to reflect these new environment variables.
63-154
: Simplify concurrency management usingp-limit
To enhance code readability and simplify concurrency control, consider using the
p-limit
library to manage the maximum number of concurrent promises.Apply this diff to refactor the processing loop:
+import pLimit from 'p-limit' const sources = Object.values(MemberEnrichmentSource) as MemberEnrichmentSource[] setImmediate(async () => { await svc.init(false) - let processingCount = 0 let updatedMembersCount = 0 let skippedMembersCount = 0 let failedMembersCount = 0 let totalProcessingTime = 0 let processedMembersCount = 0 const REPORT_INTERVAL = 10 const pageSize = 100 let members = await getEnrichableMembers(pageSize) - let pagePromises: Promise<void>[] = [] + + const limit = pLimit(maxConcurrentProcessing) while (members.length > 0) { svc.log.info({ memberCount: members.length }, 'Processing members!') // process members - for (const memberId of members) { - while (processingCount >= maxConcurrentProcessing) { - await timeout(100) - } - - processingCount++ + const tasks = members.map((memberId) => + limit(async () => { const startTime = Date.now() - const promise = processMemberSources(memberId, sources) - .then((res) => { - processingCount-- - if (res) { - const processingTime = Date.now() - startTime - totalProcessingTime += processingTime - processedMembersCount++ - - updatedMembersCount++ - } else { - skippedMembersCount++ + try { + const res = await processMemberSources(memberId, sources) + if (res) { + const processingTime = Date.now() - startTime + totalProcessingTime += processingTime + processedMembersCount++ + updatedMembersCount++ + } else { + skippedMembersCount++ + } + } catch (err) { + svc.log.error(err, { memberId }, 'Error while processing member enrichment sources!') + failedMembersCount++ } - - // Report average processing time every REPORT_INTERVAL members - if (processedMembersCount > 0 && processedMembersCount % REPORT_INTERVAL === 0) { - const averageProcessingTime = totalProcessingTime / processedMembersCount - svc.log.info( - { - averageProcessingTime: `${(averageProcessingTime / 1000).toFixed(2)}s`, - processedMembers: processedMembersCount, - updatedMembers: updatedMembersCount, - skippedMembers: skippedMembersCount, - failedMembers: failedMembersCount, - }, - 'Processing time statistics', - ) - } - }) - .catch((err) => { - processingCount-- - svc.log.error(err, { memberId }, 'Error while processing member enrichment sources!') - failedMembersCount++ - }) - pagePromises.push(promise) - } - - await Promise.all(pagePromises) - pagePromises = [] + // Report average processing time every REPORT_INTERVAL members + if ( + processedMembersCount > 0 && + processedMembersCount % REPORT_INTERVAL === 0 + ) { + const averageProcessingTime = totalProcessingTime / processedMembersCount + svc.log.info( + { + averageProcessingTime: `${(averageProcessingTime / 1000).toFixed(2)}s`, + processedMembers: processedMembersCount, + updatedMembers: updatedMembersCount, + skippedMembers: skippedMembersCount, + failedMembers: failedMembersCount, + }, + 'Processing time statistics', + ) + } + }), + ) + + await Promise.all(tasks) // load next page members = await getEnrichableMembers(pageSize, members[members.length - 1]) svc.log.info( { updatedMembersCount, skippedMembersCount, failedMembersCount, averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, }, 'Current statistics!', ) }This refactoring removes the manual concurrency management and leverages
p-limit
for better readability and maintainability.
63-154
: ReplacesetImmediate
with anasync
functionUsing
setImmediate
is unnecessary in this context. Consider wrapping your main logic inside anasync
function for clarity.Apply this diff to refactor the entry point:
-setImmediate(async () => { +async function main() { await svc.init(false) // ... rest of the code ... - process.exit(0) -}) + process.exit(0) +} + +main().catch((error) => { + svc.log.error({ error }, 'An unexpected error occurred') + process.exit(1) +})This change improves readability and provides a clear entry point for your script.
services/libs/data-access-layer/src/organizations/organizations.ts (3)
480-485
: Optimize organization lookup by querying all verified identities at onceCurrently, the code loops over
verifiedIdentities
and queries the database for each identity until a match is found. This can be inefficient, especially with many identities.Consider modifying
findOrgByVerifiedIdentity
to accept multiple identities and perform a single database query. This would reduce database load and improve performance:// Modify the function to accept an array of identities async function findOrgByVerifiedIdentities( qx: QueryExecutor, tenantId: string, identities: IDbOrgIdentity[], ): Promise<IDbOrganization | null> { // Implementation to query all identities at once } // Use the new function const existing = await findOrgByVerifiedIdentities(qe, tenantId, verifiedIdentities)
544-556
: Batch insert organization identities to improve performanceThe code inserts each identity individually using
addOrgIdentity
, which can be inefficient for multiple identities.Consider modifying
addOrgIdentity
to accept an array of identities and perform a batch insert:// Modify the function to accept multiple identities async function addOrgIdentities( qe: QueryExecutor, identities: { organizationId: string tenantId: string identities: IDbOrgIdentity[] integrationId?: string }, ): Promise<void> { // Implementation of batch insert } // Use the new function await addOrgIdentities(qe, { organizationId: id, tenantId, identities: data.identities, integrationId, })
560-563
: Enhance error handling with more descriptive messagesThe catch block logs the error and rethrows it, but additional context could aid in debugging.
Consider adding more context to the error message or handling specific error types:
catch (err) { log.error({ err, data }, 'Error while upserting an organization!') throw new Error(`Failed to upsert organization: ${err.message}`) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (13)
backend/src/database/migrations/V1731052735__llm-prompt-history.sql
(1 hunks)services/apps/data_sink_worker/src/service/organization.service.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
(3 hunks)services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(1 hunks)services/libs/common_services/src/services/llm.service.ts
(1 hunks)services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts
(4 hunks)services/libs/data-access-layer/src/organizations/attributesConfig.ts
(1 hunks)services/libs/data-access-layer/src/organizations/organizations.ts
(2 hunks)services/libs/types/src/enums/llm.ts
(1 hunks)services/libs/types/src/enums/organizations.ts
(1 hunks)services/libs/types/src/llm.ts
(1 hunks)services/libs/types/src/premium/enrichment.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- services/apps/premium/members_enrichment_worker/src/types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- services/libs/types/src/enums/llm.ts
- services/libs/types/src/premium/enrichment.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
[error] 171-171: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 182-182: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 186-186: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 195-195: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 199-199: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
🔇 Additional comments (17)
services/libs/types/src/enums/organizations.ts (2)
35-35
: Verify the distinction between email-based identifiers
The addition of EMAIL
to OrganizationIdentityType
looks good, but please verify that there's a clear distinction between this and OrganizationAttributeSource.EMAIL
to avoid confusion in the codebase.
✅ Verification successful
No overlap concerns between email-related enums
The enums serve distinct purposes with clear separation:
OrganizationIdentityType.EMAIL
represents an identity type for organizations (like usernames or domains)OrganizationAttributeSource.EMAIL
indicates the source of organization attributes (like GitHub or custom sources)
The distinction is further reinforced by their usage patterns:
- Identity types are used in identity management and organization structure
- Attribute sources are used in enrichment and data attribution contexts
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usages of both email-related enums to ensure proper distinction
echo "Searching for OrganizationIdentityType.EMAIL usage:"
rg "OrganizationIdentityType.EMAIL" -A 3
echo -e "\nSearching for OrganizationAttributeSource.EMAIL usage:"
rg "OrganizationAttributeSource.EMAIL" -A 3
Length of output: 2525
Script:
#!/bin/bash
# Let's search for the actual usage context of both email-related enums
echo "Searching for files containing OrganizationIdentityType:"
rg -l "OrganizationIdentityType"
echo -e "\nSearching for files containing OrganizationAttributeSource:"
rg -l "OrganizationAttributeSource"
echo -e "\nLet's look at the enum definitions:"
cat services/libs/types/src/enums/organizations.ts
Length of output: 5087
43-43
: Consider using specific enrichment sources instead of a generic one
The ENRICHMENT
value seems redundant when there are already specific enrichment sources defined in OrganizationSource
(e.g., ENRICHMENT_PROGAI
, ENRICHMENT_CLEARBIT
). This could lead to ambiguity about which enrichment source was actually used.
Consider either:
- Using the specific enrichment sources directly, or
- Adding a comment explaining when to use this generic source versus the specific ones
services/libs/types/src/llm.ts (1)
1-19
: Well-structured base interfaces!
The base interfaces are well-designed with:
- Clear hierarchy through interface extension
- Descriptive property names
- Proper type definitions
services/libs/common_services/src/services/llm.service.ts (2)
1-27
: LGTM! Well-structured imports and interface definition.
The imports are properly organized, and the IBedrockClientCredentials
interface is clear and focused.
149-163
: 🛠️ Refactor suggestion
Replace any
type with proper typing
The method uses any
type which bypasses TypeScript's type checking. Consider defining proper types for better type safety.
public async consolidateMemberEnrichmentData(
memberId: string,
prompt: string,
- // eslint-disable-next-line @typescript-eslint/no-explicit-any
- ): Promise<ILlmResult<LlmMemberEnrichmentResult>> {
+ ): Promise<ILlmResult<LlmMemberEnrichmentResult>> {
const response = await this.queryLlm(LlmQueryType.MEMBER_ENRICHMENT, prompt, memberId)
const result = JSON.parse(response.answer)
return {
- result,
+ result: result as LlmMemberEnrichmentResult,
...response,
}
}
Likely invalid or redundant comment.
services/libs/data-access-layer/src/organizations/attributesConfig.ts (2)
234-234
: LGTM! Appropriate priority placement for enrichment source.
The placement of OrganizationAttributeSource.ENRICHMENT
at the lowest priority is correct, ensuring that enriched data doesn't override data from more authoritative sources like custom inputs or direct integrations.
234-234
: Verify the OrganizationAttributeSource enum update.
Let's ensure the ENRICHMENT value is properly defined in the types package.
✅ Verification successful
OrganizationAttributeSource.ENRICHMENT is properly defined in the types package
The verification confirms that ENRICHMENT
is correctly defined as 'enrichment'
in the OrganizationAttributeSource
enum within services/libs/types/src/enums/organizations.ts
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the ENRICHMENT value is properly defined in the OrganizationAttributeSource enum
# Check the enum definition
rg -A 10 "export enum OrganizationAttributeSource" "services/libs/types/src/enums/organizations.ts"
Length of output: 259
services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (2)
546-552
: LGTM: Changes align with existing review comments
The integration of resetMemberEnrichedAt
is correct, and the existing review comments about error handling are still applicable.
Also applies to: 562-571
589-603
: Verify callers of findMemberEnrichmentCacheDb
The function signature has changed to accept an array of sources instead of a single source. This change requires verification of all callers to ensure they've been updated accordingly.
✅ Verification successful
Let me search for more context about the callers by examining the implementation.
Based on the context gathered, I can now generate the final response:
Function signature change is correctly handled
The only caller of findMemberEnrichmentCacheDb
is a wrapper function findMemberEnrichmentCache
in services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
, which already accepts an array of sources and correctly passes them to the modified function. The change is consistent throughout the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for function calls to findMemberEnrichmentCacheDb
ast-grep --pattern 'findMemberEnrichmentCacheDb($$$)'
Length of output: 839
Script:
#!/bin/bash
# Get context around the caller
rg -B 5 -A 5 "findMemberEnrichmentCacheDb" services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 872
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (3)
1-32
: LGTM! Import statements are well-organized
The new imports are properly structured and necessary for the added functionality.
113-116
: LGTM! Function signature updated to handle multiple sources
The change to accept an array of sources aligns well with the new multi-source data consolidation approach.
161-204
: 🛠️ Refactor suggestion
Optimize performance by avoiding the delete
operator
The data preparation logic uses the delete
operator which can impact performance. Consider using object destructuring or setting properties to undefined instead.
Apply these changes:
- delete n.contributions
+ n.contributions = undefined
- delete n.reach
+ n.reach = undefined
- delete normalized.contributions
+ normalized.contributions = undefined
- delete normalized.reach
+ normalized.reach = undefined
Likely invalid or redundant comment.
🧰 Tools
🪛 Biome
[error] 171-171: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 182-182: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 186-186: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 195-195: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
[error] 199-199: Avoid the delete operator which can impact performance.
Unsafe fix: Use an undefined assignment instead.
(lint/performance/noDelete)
services/apps/data_sink_worker/src/service/organization.service.ts (1)
36-42
: Ensure that error handling is appropriately managed after removing the try-catch block
In the findOrCreate
method, the previous implementation included a try-catch block for error handling within the transaction. With the current changes, the try-catch block has been removed. Please ensure that any exceptions thrown within the transaction are properly handled at a higher level to prevent unhandled exceptions.
services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts (1)
7-16
: Service configuration is appropriately customized
The service environment variables and configurations have been correctly set to ensure that only the necessary components are enabled for this script.
services/libs/data-access-layer/src/organizations/organizations.ts (3)
1-2
: Imports are appropriate for new functionality
The added imports generateUUIDv1
, websiteNormalizer
, and getServiceChildLogger
are necessary for the new findOrCreateOrganization
function and enhance modularity.
516-527
: Validate displayName assignment for new organizations
When creating a new organization, displayName
is set to firstVerified.value
, which may not always be the desired display name.
Ensure that firstVerified.value
is appropriate for the displayName
. If value
might not represent a suitable display name (e.g., a URL or code), consider checking and assigning a more user-friendly name if available.
498-501
:
Ensure the organization is updated regardless of recent updates
The updateOrganization
function checks if the organization's updatedAt
timestamp is older than one minute ago, which may prevent updates if the organization was recently modified.
Consider removing or adjusting the timestamp check to ensure updates are applied as intended:
// In updateOrganization function, remove the updatedAt condition
- where id = $(organizationId) and "updatedAt" <= $(oneMinuteAgo)
+ where id = $(organizationId)
Alternatively, if the timestamp check is required for concurrency control, implement optimistic locking or handle the case when the update doesn't occur due to the timestamp condition.
Likely invalid or redundant comment.
-- backup members table | ||
create table members_backup_14_11_2024 as | ||
select * | ||
from members | ||
with no data; | ||
|
||
-- Copy all data | ||
insert into members_backup_14_11_2024 | ||
select * | ||
from members; | ||
|
||
-- backup memberIdentities table | ||
create table member_identities_backup_14_11_2024 as | ||
select * | ||
from "memberIdentities" | ||
with no data; | ||
|
||
-- Copy all data | ||
insert into member_identities_backup_14_11_2024 | ||
select * | ||
from "memberIdentities"; | ||
|
||
-- backup memberOrganizations table | ||
create table member_organizations_backup_14_11_2024 as | ||
select * | ||
from "memberOrganizations" | ||
with no data; | ||
|
||
-- Copy all data | ||
insert into member_organizations_backup_14_11_2024 | ||
select * | ||
from "memberOrganizations"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve backup strategy
- Use dynamic date generation instead of hardcoding
- Add row count validation
- Consider cleanup strategy for old backups
+do $$
+declare
+ backup_date text := to_char(current_timestamp, 'DD_MM_YYYY');
+ members_count int;
+ member_identities_count int;
+ member_organizations_count int;
+begin
+ -- Get original counts
+ select count(*) into members_count from members;
+ select count(*) into member_identities_count from "memberIdentities";
+ select count(*) into member_organizations_count from "memberOrganizations";
+
-- backup members table
- create table members_backup_14_11_2024 as
+ execute format('create table members_backup_%s as
select *
from members
- with no data;
+ with no data', backup_date);
-- Copy all data
- insert into members_backup_14_11_2024
+ execute format('insert into members_backup_%s
select *
- from members;
+ from members', backup_date);
+ -- Validate backup
+ execute format('
+ if (select count(*) from members_backup_%s) != $1 then
+ raise exception ''Members backup validation failed'';
+ end if
+ ', backup_date) using members_count;
-- Similar changes for other tables...
+ -- Cleanup old backups (older than 30 days)
+ for backup_table in
+ select tablename
+ from pg_tables
+ where tablename like 'members_backup_%'
+ or tablename like 'member_identities_backup_%'
+ or tablename like 'member_organizations_backup_%'
+ loop
+ if to_date(right(backup_table, 10), 'DD_MM_YYYY') < current_date - interval '30 days' then
+ execute format('drop table if exists %I', backup_table);
+ end if;
+ end loop;
+end $$;
Committable suggestion skipped: line range outside the PR's diff.
create table "llmPromptHistory" ( | ||
id bigserial primary key, | ||
type varchar(255) not null, | ||
model text not null, | ||
"entityId" text null, | ||
metadata jsonb null, | ||
prompt text not null, | ||
answer text not null, | ||
"inputTokenCount" int not null, | ||
"outputTokenCount" int not null, | ||
"responseTimeSeconds" decimal not null, | ||
"createdAt" timestamptz not null default now() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider additional constraints and optimizations
- The
type
column could be more efficient as an enum type - The
model
column should have a CHECK constraint to validate allowed models - The
metadata
column should have validation to ensure JSON structure
+create type llm_prompt_type as enum (
+ 'organization_merge_suggestion',
+ 'member_merge_suggestion'
+ -- add other types as needed
+);
+
create table "llmPromptHistory" (
id bigserial primary key,
- type varchar(255) not null,
+ type llm_prompt_type not null,
model text not null,
+ check (model in ('anthropic.claude-v2', 'anthropic.claude-v1')),
"entityId" text null,
- metadata jsonb null,
+ metadata jsonb null check (jsonb_typeof(metadata) = 'object'),
prompt text not null,
answer text not null,
"inputTokenCount" int not null,
"outputTokenCount" int not null,
"responseTimeSeconds" decimal not null,
"createdAt" timestamptz not null default now()
);
Committable suggestion skipped: line range outside the PR's diff.
public async queryLlm( | ||
type: LlmQueryType, | ||
prompt: string, | ||
entityId?: string, | ||
metadata?: Record<string, unknown>, | ||
saveHistory = true, | ||
): Promise<ILlmResponse> { | ||
const settings = LLM_SETTINGS[type] | ||
if (!settings) { | ||
throw new Error(`No settings found for LLM query type: ${type}`) | ||
} | ||
|
||
const client = this.client(settings) | ||
|
||
const start = performance.now() | ||
const end = () => { | ||
const end = performance.now() | ||
const duration = end - start | ||
return Math.ceil(duration / 1000) | ||
} | ||
|
||
const command = new InvokeModelCommand({ | ||
body: JSON.stringify({ | ||
messages: [ | ||
{ | ||
role: 'user', | ||
content: [ | ||
{ | ||
type: 'text', | ||
text: prompt, | ||
}, | ||
], | ||
}, | ||
], | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
...(settings.arguments as any), | ||
}), | ||
modelId: settings.modelId, | ||
accept: 'application/json', | ||
contentType: 'application/json', | ||
}) | ||
|
||
let res: InvokeModelCommandOutput | ||
try { | ||
res = await client.send(command) | ||
} catch (err) { | ||
this.log.error(err, { settings, prompt }, 'Failed to query LLM!') | ||
throw err | ||
} | ||
|
||
const body = JSON.parse(res.body.transformToString()) | ||
const responseTimeSeconds = end() | ||
|
||
const inputTokenCount = body.usage.input_tokens | ||
const outputTokenCount = body.usage.output_tokens | ||
const answer = body.content[0].text | ||
|
||
const pricing = LLM_MODEL_PRICING_MAP[settings.modelId] | ||
|
||
const inputCost = (inputTokenCount / 1000) * pricing.costPer1000InputTokens | ||
const outputCost = (outputTokenCount / 1000) * pricing.costPer1000OutputTokens | ||
const totalCost = inputCost + outputCost | ||
|
||
this.log.info({ type, entityId, inputCost, outputCost, totalCost }, 'Estimated LLM cost!') | ||
|
||
const result = { | ||
prompt, | ||
answer, | ||
inputTokenCount, | ||
outputTokenCount, | ||
responseTimeSeconds, | ||
model: settings.modelId, | ||
} | ||
|
||
if (saveHistory) { | ||
try { | ||
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata) | ||
} catch (err) { | ||
this.log.error(err, 'Failed to save LLM prompt history entry!') | ||
throw err | ||
} | ||
} | ||
|
||
return result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add rate limiting and timeout configuration
The LLM query implementation should include:
- Rate limiting to prevent overwhelming the AWS Bedrock service
- Timeout configuration to handle slow responses
+import { setTimeout } from 'timers/promises'
+
export class LlmService extends LoggerBase {
+ private static readonly DEFAULT_TIMEOUT_MS = 30000
+ private static readonly DEFAULT_RETRY_COUNT = 3
+ private static readonly DEFAULT_RETRY_DELAY_MS = 1000
+
public async queryLlm(
type: LlmQueryType,
prompt: string,
entityId?: string,
metadata?: Record<string, unknown>,
saveHistory = true,
+ timeoutMs = LlmService.DEFAULT_TIMEOUT_MS,
): Promise<ILlmResponse> {
// ... existing code ...
let res: InvokeModelCommandOutput
+ let lastError: Error
+
+ for (let attempt = 1; attempt <= LlmService.DEFAULT_RETRY_COUNT; attempt++) {
try {
- res = await client.send(command)
+ res = await Promise.race([
+ client.send(command),
+ setTimeout(timeoutMs).then(() => {
+ throw new Error(`LLM query timed out after ${timeoutMs}ms`)
+ }),
+ ])
+ break
} catch (err) {
- this.log.error(err, { settings, prompt }, 'Failed to query LLM!')
- throw err
+ lastError = err
+ if (attempt === LlmService.DEFAULT_RETRY_COUNT) {
+ this.log.error(err, { settings, prompt, attempts: attempt }, 'Failed to query LLM after all retries!')
+ throw err
+ }
+ this.log.warn(
+ { settings, prompt, attempt, error: err.message },
+ 'LLM query failed, retrying...',
+ )
+ await setTimeout(LlmService.DEFAULT_RETRY_DELAY_MS)
}
}
Committable suggestion skipped: line range outside the PR's diff.
type: LlmQueryType, | ||
prompt: string, | ||
entityId?: string, | ||
metadata?: Record<string, unknown>, | ||
saveHistory = true, | ||
): Promise<ILlmResponse> { | ||
const settings = LLM_SETTINGS[type] | ||
if (!settings) { | ||
throw new Error(`No settings found for LLM query type: ${type}`) | ||
} | ||
|
||
const client = this.client(settings) | ||
|
||
const start = performance.now() | ||
const end = () => { | ||
const end = performance.now() | ||
const duration = end - start | ||
return Math.ceil(duration / 1000) | ||
} | ||
|
||
const command = new InvokeModelCommand({ | ||
body: JSON.stringify({ | ||
messages: [ | ||
{ | ||
role: 'user', | ||
content: [ | ||
{ | ||
type: 'text', | ||
text: prompt, | ||
}, | ||
], | ||
}, | ||
], | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
...(settings.arguments as any), | ||
}), | ||
modelId: settings.modelId, | ||
accept: 'application/json', | ||
contentType: 'application/json', | ||
}) | ||
|
||
let res: InvokeModelCommandOutput | ||
try { | ||
res = await client.send(command) | ||
} catch (err) { | ||
this.log.error(err, { settings, prompt }, 'Failed to query LLM!') | ||
throw err | ||
} | ||
|
||
const body = JSON.parse(res.body.transformToString()) | ||
const responseTimeSeconds = end() | ||
|
||
const inputTokenCount = body.usage.input_tokens | ||
const outputTokenCount = body.usage.output_tokens | ||
const answer = body.content[0].text | ||
|
||
const pricing = LLM_MODEL_PRICING_MAP[settings.modelId] | ||
|
||
const inputCost = (inputTokenCount / 1000) * pricing.costPer1000InputTokens | ||
const outputCost = (outputTokenCount / 1000) * pricing.costPer1000OutputTokens | ||
const totalCost = inputCost + outputCost | ||
|
||
this.log.info({ type, entityId, inputCost, outputCost, totalCost }, 'Estimated LLM cost!') | ||
|
||
const result = { | ||
prompt, | ||
answer, | ||
inputTokenCount, | ||
outputTokenCount, | ||
responseTimeSeconds, | ||
model: settings.modelId, | ||
} | ||
|
||
if (saveHistory) { | ||
try { | ||
await this.repo.insertPromptHistoryEntry(type, settings.modelId, result, entityId, metadata) | ||
} catch (err) { | ||
this.log.error(err, 'Failed to save LLM prompt history entry!') | ||
throw err | ||
} | ||
} | ||
|
||
return result | ||
} | ||
|
||
public async consolidateMemberEnrichmentData( | ||
memberId: string, | ||
prompt: string, | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
): Promise<ILlmResult<LlmMemberEnrichmentResult>> { | ||
const response = await this.queryLlm(LlmQueryType.MEMBER_ENRICHMENT, prompt, memberId) | ||
|
||
const result = JSON.parse(response.answer) | ||
|
||
return { | ||
result, | ||
...response, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Add data sanitization and redaction for sensitive LLM data logging
The LLM service currently logs sensitive information without proper sanitization, including:
- Complete prompt content
- Full response content
- Token counts and cost details
- Entity IDs and metadata
While the rest of the codebase consistently uses sanitization utilities like sanitizeHtml
and redactNullByte
, the LLM service lacks similar protections. To maintain security best practices, you should:
-
Create a sanitization helper for LLM data that:
- Redacts sensitive content from prompts and responses
- Masks token counts and cost information
- Sanitizes entity IDs and metadata before logging
-
Update the logging calls in
llm.service.ts
to use the sanitization helper:
this.log.info({
type,
entityId,
// Sanitize cost info
costs: maskSensitiveData({ inputCost, outputCost, totalCost })
}, 'Estimated LLM cost!')
- Consider moving sensitive data logging to debug level instead of info level for better control.
🔗 Analysis chain
Enhance security measures for sensitive data handling
Consider implementing these security enhancements:
- Sanitize sensitive data before logging
- Implement AWS credential rotation
- Add input validation for prompts to prevent injection attacks
Let me analyze the logging patterns in the LLM service code and compare it with the rest of the codebase to verify the security concerns.
From the logs search results, I can see that:
- The LLM service logs sensitive information like:
- Full prompt content
- Full response content
- Cost information including token counts
- Entity IDs and metadata
- The logging patterns in other services show they generally:
- Redact sensitive data
- Log at appropriate levels (debug/info/warn/error)
- Include structured metadata
- Focus on operational data rather than payload content
Let me verify if there are any sanitization helpers being used in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for potential sensitive data logging
rg -i "log\.(info|warn|error|debug)" -A 2 -B 2
Length of output: 492624
Script:
#!/bin/bash
# Look for sanitization/redaction utilities
rg -g '!dist' -g '!node_modules' -g '!*.map' "sanitize|redact|mask"
Length of output: 18758
export async function fetchMemberDataForLLMSquashing( | ||
db: DbStore, | ||
memberId: string, | ||
): Promise<IMemberOriginalData | null> { | ||
const result = await db.connection().oneOrNone( | ||
` | ||
with member_orgs as (select distinct mo."memberId", | ||
mo."organizationId" as "orgId", | ||
o."displayName" as "orgName", | ||
mo.title as "jobTitle", | ||
mo."dateStart", | ||
mo."dateEnd", | ||
mo.source | ||
from "memberOrganizations" mo | ||
inner join organizations o on mo."organizationId" = o.id | ||
where mo."memberId" = $(memberId) | ||
and mo."deletedAt" is null | ||
and o."deletedAt" is null) | ||
select m."displayName", | ||
m.attributes, | ||
m."manuallyChangedFields", | ||
m."tenantId", | ||
coalesce( | ||
(select json_agg( | ||
(select row_to_json(r) | ||
from (select mi.type, | ||
mi.platform, | ||
mi.value) r) | ||
) | ||
from "memberIdentities" mi | ||
where mi."memberId" = m.id | ||
and verified = true), '[]'::json) as identities, | ||
case | ||
when exists (select 1 from member_orgs where "memberId" = m.id) | ||
then ( | ||
select json_agg( | ||
(select row_to_json(r) | ||
from (select mo."orgId", | ||
mo."orgName", | ||
mo."jobTitle", | ||
mo."dateStart", | ||
mo."dateEnd", | ||
mo.source) r) | ||
) | ||
from member_orgs mo | ||
where mo."memberId" = m.id | ||
) | ||
else '[]'::json | ||
end as organizations | ||
from members m | ||
where m.id = $(memberId) | ||
and m."deletedAt" is null | ||
group by m.id, m."displayName", m.attributes, m."manuallyChangedFields"; | ||
`, | ||
{ | ||
memberId, | ||
}, | ||
) | ||
|
||
return result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize query performance in fetchMemberDataForLLMSquashing
The query structure could be optimized for better performance:
- The GROUP BY clause includes unnecessary columns as
m.id
is sufficient for grouping. - Consider adding index hints for the JOIN operations.
Apply this diff to optimize the query:
- group by m.id, m."displayName", m.attributes, m."manuallyChangedFields";
+ group by m.id;
Additionally, consider adding the following indexes if they don't exist:
CREATE INDEX IF NOT EXISTS idx_member_organizations_member_id ON "memberOrganizations" ("memberId");
CREATE INDEX IF NOT EXISTS idx_member_identities_member_id ON "memberIdentities" ("memberId");
Answer with JSON only and nothing else. Ensure the response is complete and valid JSON. | ||
` | ||
|
||
const data = await llmService.consolidateMemberEnrichmentData(memberId, prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for LLM service call
The LLM service call lacks error handling. Add a try-catch block to handle potential failures gracefully.
Add error handling:
+ try {
const data = await llmService.consolidateMemberEnrichmentData(memberId, prompt)
+ } catch (error) {
+ svc.log.error({ memberId, error }, 'Failed to consolidate member enrichment data')
+ return false
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const data = await llmService.consolidateMemberEnrichmentData(memberId, prompt) | |
try { | |
const data = await llmService.consolidateMemberEnrichmentData(memberId, prompt) | |
} catch (error) { | |
svc.log.error({ memberId, error }, 'Failed to consolidate member enrichment data') | |
return false | |
} |
const llmService = new LlmService( | ||
svc.postgres.writer, | ||
{ | ||
accessKeyId: process.env['CROWD_AWS_BEDROCK_ACCESS_KEY_ID'], | ||
secretAccessKey: process.env['CROWD_AWS_BEDROCK_SECRET_ACCESS_KEY'], | ||
}, | ||
svc.log, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate AWS credentials before LLM service initialization
The code directly uses AWS credentials from environment variables without validation. Consider adding validation to prevent runtime errors.
Add credential validation:
+ const awsAccessKeyId = process.env['CROWD_AWS_BEDROCK_ACCESS_KEY_ID']
+ const awsSecretKey = process.env['CROWD_AWS_BEDROCK_SECRET_ACCESS_KEY']
+ if (!awsAccessKeyId || !awsSecretKey) {
+ throw new Error('AWS Bedrock credentials are not configured')
+ }
const llmService = new LlmService(
svc.postgres.writer,
{
- accessKeyId: process.env['CROWD_AWS_BEDROCK_ACCESS_KEY_ID'],
- secretAccessKey: process.env['CROWD_AWS_BEDROCK_SECRET_ACCESS_KEY'],
+ accessKeyId: awsAccessKeyId,
+ secretAccessKey: awsSecretKey,
},
svc.log,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const llmService = new LlmService( | |
svc.postgres.writer, | |
{ | |
accessKeyId: process.env['CROWD_AWS_BEDROCK_ACCESS_KEY_ID'], | |
secretAccessKey: process.env['CROWD_AWS_BEDROCK_SECRET_ACCESS_KEY'], | |
}, | |
svc.log, | |
) | |
const awsAccessKeyId = process.env['CROWD_AWS_BEDROCK_ACCESS_KEY_ID'] | |
const awsSecretKey = process.env['CROWD_AWS_BEDROCK_SECRET_ACCESS_KEY'] | |
if (!awsAccessKeyId || !awsSecretKey) { | |
throw new Error('AWS Bedrock credentials are not configured') | |
} | |
const llmService = new LlmService( | |
svc.postgres.writer, | |
{ | |
accessKeyId: awsAccessKeyId, | |
secretAccessKey: awsSecretKey, | |
}, | |
svc.log, | |
) |
platform: toUpdate.p, | ||
type: toUpdate.t as MemberIdentityType, | ||
value: toUpdate.v, | ||
verified: toUpdate.ve, | ||
}), | ||
) | ||
} else { | ||
svc.log.warn({ memberId, toUpdate }, 'Unknown identity type!') | ||
} | ||
} | ||
} | ||
|
||
if (data.result.changes.identities.new) { | ||
for (const toAdd of data.result.changes.identities.new) { | ||
if (identityTypes.includes(toAdd.t as MemberIdentityType)) { | ||
svc.log.info({ memberId, toAdd }, 'Adding new identity!') | ||
promises.push( | ||
upsertMemberIdentity(qx, { | ||
memberId, | ||
tenantId: existingMemberData.tenantId, | ||
platform: toAdd.p, | ||
type: toAdd.t as MemberIdentityType, | ||
value: toAdd.v, | ||
verified: toAdd.ve, | ||
}), | ||
) | ||
} else { | ||
svc.log.warn({ memberId, toAdd }, 'Unknown identity type!') | ||
} | ||
} | ||
} | ||
} | ||
|
||
// process organizations | ||
if (data.result.changes.organizations) { | ||
const sources = Object.values(OrganizationSource) | ||
|
||
if (data.result.changes.organizations.newConns) { | ||
for (const conn of data.result.changes.organizations.newConns) { | ||
if (sources.includes(conn.s as OrganizationSource)) { | ||
svc.log.info({ memberId, conn }, 'Adding new connection to existing organization!') | ||
promises.push( | ||
insertWorkExperience( | ||
tx.transaction(), | ||
memberId, | ||
conn.orgId, | ||
conn.t, | ||
conn.ds, | ||
conn.de, | ||
conn.s as OrganizationSource, | ||
), | ||
) | ||
} else { | ||
svc.log.warn({ memberId, conn }, 'Unknown organization source!') | ||
} | ||
} | ||
} | ||
|
||
if (data.result.changes.organizations.newOrgs) { | ||
for (const org of data.result.changes.organizations.newOrgs) { | ||
svc.log.info({ memberId, org }, 'Adding new organization!') | ||
promises.push( | ||
findOrCreateOrganization( | ||
qx, | ||
existingMemberData.tenantId, | ||
OrganizationAttributeSource.ENRICHMENT, | ||
{ | ||
displayName: org.n, | ||
identities: org.i.map((i) => { | ||
return { | ||
type: i.t as OrganizationIdentityType, | ||
platform: i.p, | ||
value: i.v, | ||
verified: i.ve, | ||
} | ||
}), | ||
}, | ||
).then((orgId) => | ||
insertWorkExperience( | ||
tx.transaction(), | ||
memberId, | ||
orgId, | ||
org.conn.t, | ||
org.conn.ds, | ||
org.conn.de, | ||
org.conn.s as OrganizationSource, | ||
), | ||
), | ||
) | ||
} | ||
} | ||
} | ||
|
||
// also touch members.lastEnriched date | ||
promises.push( | ||
updateLastEnrichedDate(tx.transaction(), memberId, existingMemberData.tenantId), | ||
) | ||
|
||
await Promise.all(promises) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add transaction error handling and sanitize logs
The transaction block handles multiple database operations but lacks structured error handling. Additionally, some log statements might expose sensitive data.
Add error handling and sanitize logs:
+ try {
await svc.postgres.writer.transactionally(async (tx) => {
// ... existing code ...
})
+ } catch (error) {
+ svc.log.error({ memberId, error: error.message }, 'Failed to process member data updates')
+ throw error
+ }
// Update log statements to exclude sensitive data
- svc.log.info({ memberId, toUpdate }, 'Updating verified flag for identity!')
+ svc.log.info({ memberId, platform: toUpdate.p }, 'Updating verified flag for identity!')
- svc.log.info({ memberId, toAdd }, 'Adding new identity!')
+ svc.log.info({ memberId, platform: toAdd.p }, 'Adding new identity!')
Committable suggestion skipped: line range outside the PR's diff.
async function getEnrichableMembers(limit: number, lastMemberId?: string): Promise<string[]> { | ||
const query = ` | ||
-- only use members that have more than one enrichment source | ||
with members_with_sources as (select distinct "memberId", count(*) | ||
from "memberEnrichmentCache" | ||
group by "memberId" | ||
having count(*) > 1), | ||
-- also only use members that have more than 10 activities | ||
members_with_activities as (select distinct msa."memberId", sum("activityCount") as total_activities | ||
from members_with_sources ms | ||
inner join "memberSegmentsAgg" msa on msa."memberId" = ms."memberId" | ||
-- only consider subprojects otherwise we count some activities multiple times | ||
inner join segments s on s.id = msa."segmentId" and s."tenantId" = $(tenantId) and s.type = 'subproject' | ||
group by msa."memberId" | ||
having sum("activityCount") > $(minMemberActivities)) | ||
select m.id | ||
from members m | ||
inner join members_with_activities ma on m.id = ma."memberId" | ||
where m."deletedAt" is null and m."tenantId" = $(tenantId) | ||
${lastMemberId ? `and m.id > $(lastMemberId)` : ''} | ||
and (m."lastEnriched" is null | ||
or m."lastEnriched" < now() - interval '3 months') | ||
order by ma.total_activities desc, m.id | ||
limit $(limit) | ||
` | ||
|
||
return ( | ||
await svc.postgres.writer | ||
.connection() | ||
.any(query, { lastMemberId, limit, tenantId, minMemberActivities }) | ||
).map((row) => row.id) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for database query
The function getEnrichableMembers
lacks error handling for the database query. To make the script more robust, consider adding a try-catch block to handle potential query failures.
Apply this diff to add error handling:
async function getEnrichableMembers(limit: number, lastMemberId?: string): Promise<string[]> {
const query = `
-- SQL query omitted for brevity
`
+ try {
return (
await svc.postgres.writer
.connection()
.any(query, { lastMemberId, limit, tenantId, minMemberActivities })
).map((row) => row.id)
+ } catch (error) {
+ svc.log.error({ error }, 'Failed to fetch enrichable members')
+ throw error
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async function getEnrichableMembers(limit: number, lastMemberId?: string): Promise<string[]> { | |
const query = ` | |
-- only use members that have more than one enrichment source | |
with members_with_sources as (select distinct "memberId", count(*) | |
from "memberEnrichmentCache" | |
group by "memberId" | |
having count(*) > 1), | |
-- also only use members that have more than 10 activities | |
members_with_activities as (select distinct msa."memberId", sum("activityCount") as total_activities | |
from members_with_sources ms | |
inner join "memberSegmentsAgg" msa on msa."memberId" = ms."memberId" | |
-- only consider subprojects otherwise we count some activities multiple times | |
inner join segments s on s.id = msa."segmentId" and s."tenantId" = $(tenantId) and s.type = 'subproject' | |
group by msa."memberId" | |
having sum("activityCount") > $(minMemberActivities)) | |
select m.id | |
from members m | |
inner join members_with_activities ma on m.id = ma."memberId" | |
where m."deletedAt" is null and m."tenantId" = $(tenantId) | |
${lastMemberId ? `and m.id > $(lastMemberId)` : ''} | |
and (m."lastEnriched" is null | |
or m."lastEnriched" < now() - interval '3 months') | |
order by ma.total_activities desc, m.id | |
limit $(limit) | |
` | |
return ( | |
await svc.postgres.writer | |
.connection() | |
.any(query, { lastMemberId, limit, tenantId, minMemberActivities }) | |
).map((row) => row.id) | |
} | |
async function getEnrichableMembers(limit: number, lastMemberId?: string): Promise<string[]> { | |
const query = ` | |
-- only use members that have more than one enrichment source | |
with members_with_sources as (select distinct "memberId", count(*) | |
from "memberEnrichmentCache" | |
group by "memberId" | |
having count(*) > 1), | |
-- also only use members that have more than 10 activities | |
members_with_activities as (select distinct msa."memberId", sum("activityCount") as total_activities | |
from members_with_sources ms | |
inner join "memberSegmentsAgg" msa on msa."memberId" = ms."memberId" | |
-- only consider subprojects otherwise we count some activities multiple times | |
inner join segments s on s.id = msa."segmentId" and s."tenantId" = $(tenantId) and s.type = 'subproject' | |
group by msa."memberId" | |
having sum("activityCount") > $(minMemberActivities)) | |
select m.id | |
from members m | |
inner join members_with_activities ma on m.id = ma."memberId" | |
where m."deletedAt" is null and m."tenantId" = $(tenantId) | |
${lastMemberId ? `and m.id > $(lastMemberId)` : ''} | |
and (m."lastEnriched" is null | |
or m."lastEnriched" < now() - interval '3 months') | |
order by ma.total_activities desc, m.id | |
limit $(limit) | |
` | |
try { | |
return ( | |
await svc.postgres.writer | |
.connection() | |
.any(query, { lastMemberId, limit, tenantId, minMemberActivities }) | |
).map((row) => row.id) | |
} catch (error) { | |
svc.log.error({ error }, 'Failed to fetch enrichable members') | |
throw error | |
} | |
} |
for (const identity of data.identities.filter((i) => | ||
[ | ||
OrganizationIdentityType.PRIMARY_DOMAIN, | ||
OrganizationIdentityType.ALTERNATIVE_DOMAIN, | ||
].includes(i.type), | ||
)) { | ||
identity.value = websiteNormalizer(identity.value, false) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid mutating input data directly to prevent unintended side effects
The loop normalizes website identities by modifying identity.value
directly in data.identities
. This could lead to unintended side effects if data
is used elsewhere after this function.
Consider creating a copy of the identities or the values before normalization to maintain data integrity:
- for (const identity of data.identities.filter((i) =>
+ const identitiesCopy = data.identities.map(identity => ({ ...identity }))
+ for (const identity of identitiesCopy.filter((i) =>
[
OrganizationIdentityType.PRIMARY_DOMAIN,
OrganizationIdentityType.ALTERNATIVE_DOMAIN,
].includes(i.type),
)) {
identity.value = websiteNormalizer(identity.value, false)
}
+ // Use identitiesCopy for further processing instead of data.identities
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts (1)
9-15
: Consider extracting configuration to a dedicated config fileWhile the configuration setup is clear, having environment variables and service configuration inline makes it less maintainable. Consider extracting this to a dedicated configuration file.
+// config/worker.ts +export const workerConfig = { + envvars: [], + producer: { enabled: false }, + redis: { enabled: true }, + temporal: { enabled: false }, + questdb: { enabled: false }, + opensearch: { enabled: false }, +} -process.env['CROWD_TEMPORAL_TASKQUEUE'] = 'not-important' -svc.config.envvars = [] -svc.config.producer = { enabled: false } -svc.config.redis = { enabled: true } -svc.config.temporal = { enabled: false } -svc.config.questdb = { enabled: false } -svc.options.opensearch = { enabled: false } +import { workerConfig } from './config/worker' +process.env['CROWD_TEMPORAL_TASKQUEUE'] = 'not-important' +Object.assign(svc.config, workerConfig)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
services/apps/premium/members_enrichment_worker/package.json
(2 hunks)services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- services/apps/premium/members_enrichment_worker/package.json
- services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
🔇 Additional comments (2)
services/apps/premium/members_enrichment_worker/src/bin/onboarding.ts (2)
17-24
: Refer to previous review comment about command-line argument handling
The previous review comment about enhancing command-line argument handling with proper validation and error messaging is still applicable.
29-61
: Refer to previous review comments about query optimization and error handling
The previous review comments about adding type definitions, query optimization, and error handling are still applicable.
const minMemberActivities = 100 | ||
const maxConcurrentProcessing = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make constants configurable and add documentation
The hardcoded values should be configurable and documented:
-const minMemberActivities = 100
-const maxConcurrentProcessing = 5
+/**
+ * Minimum number of activities a member must have to be considered for enrichment
+ * @default 100
+ */
+const minMemberActivities = Number(process.env.MIN_MEMBER_ACTIVITIES ?? 100)
+
+/**
+ * Maximum number of members that can be processed concurrently
+ * @default 5
+ */
+const maxConcurrentProcessing = Number(process.env.MAX_CONCURRENT_PROCESSING ?? 5)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const minMemberActivities = 100 | |
const maxConcurrentProcessing = 5 | |
/** | |
* Minimum number of activities a member must have to be considered for enrichment | |
* @default 100 | |
*/ | |
const minMemberActivities = Number(process.env.MIN_MEMBER_ACTIVITIES ?? 100) | |
/** | |
* Maximum number of members that can be processed concurrently | |
* @default 5 | |
*/ | |
const maxConcurrentProcessing = Number(process.env.MAX_CONCURRENT_PROCESSING ?? 5) |
setImmediate(async () => { | ||
await svc.init(false) | ||
|
||
let processingCount = 0 | ||
let updatedMembersCount = 0 | ||
let skippedMembersCount = 0 | ||
let failedMembersCount = 0 | ||
|
||
let totalProcessingTime = 0 | ||
let processedMembersCount = 0 | ||
const REPORT_INTERVAL = 10 | ||
|
||
const pageSize = 100 | ||
let members = await getEnrichableMembers(pageSize) | ||
let pagePromises: Promise<void>[] = [] | ||
while (members.length > 0) { | ||
svc.log.info({ memberCount: members.length }, 'Processing members!') | ||
// process members just like in enrichMember workflow | ||
for (const memberId of members) { | ||
while (processingCount >= maxConcurrentProcessing) { | ||
await timeout(100) | ||
} | ||
|
||
processingCount++ | ||
const startTime = Date.now() | ||
|
||
const promise = processMemberSources(memberId, sources) | ||
.then((res) => { | ||
processingCount-- | ||
if (res) { | ||
const processingTime = Date.now() - startTime | ||
totalProcessingTime += processingTime | ||
processedMembersCount++ | ||
|
||
updatedMembersCount++ | ||
} else { | ||
skippedMembersCount++ | ||
} | ||
|
||
// Report average processing time every REPORT_INTERVAL members | ||
if (processedMembersCount > 0 && processedMembersCount % REPORT_INTERVAL === 0) { | ||
const averageProcessingTime = totalProcessingTime / processedMembersCount | ||
svc.log.info( | ||
{ | ||
averageProcessingTime: `${(averageProcessingTime / 1000).toFixed(2)}s`, | ||
processedMembers: processedMembersCount, | ||
updatedMembers: updatedMembersCount, | ||
skippedMembers: skippedMembersCount, | ||
failedMembers: failedMembersCount, | ||
}, | ||
'Processing time statistics', | ||
) | ||
} | ||
}) | ||
.catch((err) => { | ||
processingCount-- | ||
svc.log.error(err, { memberId }, 'Error while processing member enrichment sources!') | ||
failedMembersCount++ | ||
}) | ||
pagePromises.push(promise) | ||
} | ||
|
||
await Promise.all(pagePromises) | ||
pagePromises = [] | ||
// load next page | ||
members = await getEnrichableMembers(pageSize, members[members.length - 1]) | ||
|
||
svc.log.info( | ||
{ | ||
updatedMembersCount, | ||
skippedMembersCount, | ||
failedMembersCount, | ||
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | ||
}, | ||
'Current statistics!', | ||
) | ||
} | ||
|
||
svc.log.info( | ||
{ | ||
updatedMembersCount, | ||
skippedMembersCount, | ||
failedMembersCount, | ||
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | ||
totalProcessedMembers: processedMembersCount, | ||
totalProcessingTime: `${(totalProcessingTime / 1000).toFixed(2)}s`, | ||
}, | ||
'Final statistics!', | ||
) | ||
|
||
process.exit(0) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add graceful shutdown handling
The script should handle process signals (SIGTERM, SIGINT) gracefully to ensure proper cleanup and completion of in-progress tasks.
+let isShuttingDown = false
+
+async function shutdown() {
+ if (isShuttingDown) return
+ isShuttingDown = true
+
+ svc.log.info('Graceful shutdown initiated')
+
+ // Wait for in-progress tasks
+ while (processingCount > 0) {
+ svc.log.info({ processingCount }, 'Waiting for in-progress tasks to complete')
+ await timeout(1000)
+ }
+
+ // Log final statistics
+ svc.log.info(
+ {
+ updatedMembersCount,
+ skippedMembersCount,
+ failedMembersCount,
+ averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`,
+ totalProcessedMembers: processedMembersCount,
+ totalProcessingTime: `${(totalProcessingTime / 1000).toFixed(2)}s`,
+ },
+ 'Final statistics before shutdown!',
+ )
+
+ await svc.destroy()
+ process.exit(0)
+}
+
+process.on('SIGTERM', shutdown)
+process.on('SIGINT', shutdown)
+
setImmediate(async () => {
await svc.init(false)
// ... existing code ...
while (members.length > 0) {
+ if (isShuttingDown) break
// ... existing code ...
}
// ... existing code ...
})
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
setImmediate(async () => { | |
await svc.init(false) | |
let processingCount = 0 | |
let updatedMembersCount = 0 | |
let skippedMembersCount = 0 | |
let failedMembersCount = 0 | |
let totalProcessingTime = 0 | |
let processedMembersCount = 0 | |
const REPORT_INTERVAL = 10 | |
const pageSize = 100 | |
let members = await getEnrichableMembers(pageSize) | |
let pagePromises: Promise<void>[] = [] | |
while (members.length > 0) { | |
svc.log.info({ memberCount: members.length }, 'Processing members!') | |
// process members just like in enrichMember workflow | |
for (const memberId of members) { | |
while (processingCount >= maxConcurrentProcessing) { | |
await timeout(100) | |
} | |
processingCount++ | |
const startTime = Date.now() | |
const promise = processMemberSources(memberId, sources) | |
.then((res) => { | |
processingCount-- | |
if (res) { | |
const processingTime = Date.now() - startTime | |
totalProcessingTime += processingTime | |
processedMembersCount++ | |
updatedMembersCount++ | |
} else { | |
skippedMembersCount++ | |
} | |
// Report average processing time every REPORT_INTERVAL members | |
if (processedMembersCount > 0 && processedMembersCount % REPORT_INTERVAL === 0) { | |
const averageProcessingTime = totalProcessingTime / processedMembersCount | |
svc.log.info( | |
{ | |
averageProcessingTime: `${(averageProcessingTime / 1000).toFixed(2)}s`, | |
processedMembers: processedMembersCount, | |
updatedMembers: updatedMembersCount, | |
skippedMembers: skippedMembersCount, | |
failedMembers: failedMembersCount, | |
}, | |
'Processing time statistics', | |
) | |
} | |
}) | |
.catch((err) => { | |
processingCount-- | |
svc.log.error(err, { memberId }, 'Error while processing member enrichment sources!') | |
failedMembersCount++ | |
}) | |
pagePromises.push(promise) | |
} | |
await Promise.all(pagePromises) | |
pagePromises = [] | |
// load next page | |
members = await getEnrichableMembers(pageSize, members[members.length - 1]) | |
svc.log.info( | |
{ | |
updatedMembersCount, | |
skippedMembersCount, | |
failedMembersCount, | |
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | |
}, | |
'Current statistics!', | |
) | |
} | |
svc.log.info( | |
{ | |
updatedMembersCount, | |
skippedMembersCount, | |
failedMembersCount, | |
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | |
totalProcessedMembers: processedMembersCount, | |
totalProcessingTime: `${(totalProcessingTime / 1000).toFixed(2)}s`, | |
}, | |
'Final statistics!', | |
) | |
process.exit(0) | |
}) | |
let isShuttingDown = false | |
async function shutdown() { | |
if (isShuttingDown) return | |
isShuttingDown = true | |
svc.log.info('Graceful shutdown initiated') | |
// Wait for in-progress tasks | |
while (processingCount > 0) { | |
svc.log.info({ processingCount }, 'Waiting for in-progress tasks to complete') | |
await timeout(1000) | |
} | |
// Log final statistics | |
svc.log.info( | |
{ | |
updatedMembersCount, | |
skippedMembersCount, | |
failedMembersCount, | |
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | |
totalProcessedMembers: processedMembersCount, | |
totalProcessingTime: `${(totalProcessingTime / 1000).toFixed(2)}s`, | |
}, | |
'Final statistics before shutdown!', | |
) | |
await svc.destroy() | |
process.exit(0) | |
} | |
process.on('SIGTERM', shutdown) | |
process.on('SIGINT', shutdown) | |
setImmediate(async () => { | |
await svc.init(false) | |
let processingCount = 0 | |
let updatedMembersCount = 0 | |
let skippedMembersCount = 0 | |
let failedMembersCount = 0 | |
let totalProcessingTime = 0 | |
let processedMembersCount = 0 | |
const REPORT_INTERVAL = 10 | |
const pageSize = 100 | |
let members = await getEnrichableMembers(pageSize) | |
let pagePromises: Promise<void>[] = [] | |
while (members.length > 0) { | |
if (isShuttingDown) break | |
svc.log.info({ memberCount: members.length }, 'Processing members!') | |
// process members just like in enrichMember workflow | |
for (const memberId of members) { | |
while (processingCount >= maxConcurrentProcessing) { | |
await timeout(100) | |
} | |
processingCount++ | |
const startTime = Date.now() | |
const promise = processMemberSources(memberId, sources) | |
.then((res) => { | |
processingCount-- | |
if (res) { | |
const processingTime = Date.now() - startTime | |
totalProcessingTime += processingTime | |
processedMembersCount++ | |
updatedMembersCount++ | |
} else { | |
skippedMembersCount++ | |
} | |
// Report average processing time every REPORT_INTERVAL members | |
if (processedMembersCount > 0 && processedMembersCount % REPORT_INTERVAL === 0) { | |
const averageProcessingTime = totalProcessingTime / processedMembersCount | |
svc.log.info( | |
{ | |
averageProcessingTime: `${(averageProcessingTime / 1000).toFixed(2)}s`, | |
processedMembers: processedMembersCount, | |
updatedMembers: updatedMembersCount, | |
skippedMembers: skippedMembersCount, | |
failedMembers: failedMembersCount, | |
}, | |
'Processing time statistics', | |
) | |
} | |
}) | |
.catch((err) => { | |
processingCount-- | |
svc.log.error(err, { memberId }, 'Error while processing member enrichment sources!') | |
failedMembersCount++ | |
}) | |
pagePromises.push(promise) | |
} | |
await Promise.all(pagePromises) | |
pagePromises = [] | |
// load next page | |
members = await getEnrichableMembers(pageSize, members[members.length - 1]) | |
svc.log.info( | |
{ | |
updatedMembersCount, | |
skippedMembersCount, | |
failedMembersCount, | |
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | |
}, | |
'Current statistics!', | |
) | |
} | |
svc.log.info( | |
{ | |
updatedMembersCount, | |
skippedMembersCount, | |
failedMembersCount, | |
averageProcessingTime: `${(totalProcessingTime / processedMembersCount / 1000).toFixed(2)}s`, | |
totalProcessedMembers: processedMembersCount, | |
totalProcessingTime: `${(totalProcessingTime / 1000).toFixed(2)}s`, | |
}, | |
'Final statistics!', | |
) | |
process.exit(0) | |
}) |
Changes proposed ✍️
What
copilot:summary
copilot:poem
Why
How
copilot:walkthrough
Checklist ✅
Feature
,Improvement
, orBug
.Summary by CodeRabbit
Release Notes
New Features
llmPromptHistory
table for tracking prompts and responses.Improvements
Documentation