Skip to content

Commit

Permalink
Merge branch 'main' into member-link-toast-message
Browse files Browse the repository at this point in the history
  • Loading branch information
peoray authored Oct 16, 2023
2 parents c65d438 + 372d2f2 commit 5f120e8
Show file tree
Hide file tree
Showing 26 changed files with 171 additions and 94 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,18 @@
- [Book a call](#📞-book-a-call)

## About crowd.dev
crowd.dev is the developer data platform (DDP) that lets companies centralize all touch points developers have with their product and brand - be it in community (e.g. Stack Overflow or Reddit), product (open-source or SaaS), or commercial channels (e.g. HubSpot). The platform pulls data from a variety of different sources, normalizes it, matches identities across platforms, and enriches it with 3rd party data. The result is a unified 360-view of who the developers are that engage with your product and community, which companies they work for, and where they stand in their personal customer journey.
crowd.dev is the developer data platform (DDP) that lets companies centralize all touch points developers have with their product and brand, be it in community (e.g. Stack Overflow or Reddit), product (open-source or SaaS), or commercial channels (e.g. HubSpot). The platform pulls data from a variety of different sources, normalizes it, matches identities across platforms, and enriches it with 3rd party data. The result is a unified 360-view of who the developers are that engage with your product and community, which companies they work for, and where they stand in their personal customer journey.

crowd.dev is open-source, built with developers in mind, available for both hosted and self-hosted deployments, open to extensions, and offers full control over your data.

**To our **users**:**
- You can get actively involved, contribute to our roadmap, and turn crowd.dev into the tool you always wanted.
- We are open regarding what we are building, allowing you to take a look inside, and making sure we handle your data in a privacy-preserving way.
- You will never be locked in by us. Our interests as a company are aligned with yours and we need to make sure that we always deliver enough value to your with our commercial offering in relation to our pricing.
- You will never be locked in by us. Our interests as a company are aligned with you and we need to make sure that we always deliver enough value to you with our commercial offering in relation to our pricing.

**To our developer community:**
- You can self-host crowd.dev to centralize data for your community or company while keeping full control over your data.
- Our product is build for extensibilty. If you can think of any use cases that you want to build with the data we collect and store for you, please go ahead and build it! We will be here to help out if you need us.
- Our product is built for extensibilty. If you can think of any use cases that you want to build with the data we collect and store for you, please go ahead and build it! We will be here to help out if you need us.
- You can actively contribute to crowd.dev (e.g. integrations), and we will be supporting you along the journey. Just take a look at our [Contributing guide](https://github.com/CrowdDotDev/crowd.dev/blob/main/CONTRIBUTING.md).

## ✨ Features
Expand All @@ -64,14 +64,14 @@ crowd.dev is open-source, built with developers in mind, available for both host
- **User enrichment** with 25+ attributes, including emails, social profiles, work experience, and technical skills. [cloud only]
- **Organization enrichment** with 50+ attributes, including industry, headcount, and revenue. [cloud only]
- **Sentiment analysis and conversation detection** to stay on top of what's going on in your open source community. [cloud only]
- **[Eagle Eye](https://www.crowd.dev/eagle-eye)**: Monitor dev-focussed community platforms to find relevant content to engage with, helping you to gain developers’ mindshare and grow your community organically [cloud only]
- **[Eagle Eye](https://www.crowd.dev/eagle-eye)**: Monitor dev-focused community platforms to find relevant content to engage with, helping you to gain developers’ mindshare and grow your community organically [cloud only]


## 🚀 Getting started

### Cloud version

Our <a href="https://app.crowd.dev/">cloud version</a> is a fast, easy and free way to get started with crowd.dev.
Our <a href="https://app.crowd.dev/">cloud version</a> is a fast, easy, and free way to get started with crowd.dev.

### Self-hosted version

Expand Down
53 changes: 50 additions & 3 deletions backend/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,10 @@
"@crowd/common": "file:../services/libs/common",
"@crowd/integrations": "file:../services/libs/integrations",
"@crowd/logging": "file:../services/libs/logging",
"@crowd/tracing": "file:../services/libs/tracing",
"@crowd/opensearch": "file:../services/libs/opensearch",
"@crowd/redis": "file:../services/libs/redis",
"@crowd/sqs": "file:../services/libs/sqs",
"@crowd/tracing": "file:../services/libs/tracing",
"@crowd/types": "file:../services/libs/types",
"@cubejs-client/core": "^0.30.4",
"@google-cloud/storage": "5.3.0",
Expand Down Expand Up @@ -97,6 +97,7 @@
"erlpack": "^0.1.4",
"express": "4.17.1",
"express-rate-limit": "6.5.1",
"fast-levenshtein": "^3.0.0",
"formidable-serverless": "1.1.1",
"he": "^1.2.0",
"helmet": "4.1.1",
Expand Down
2 changes: 1 addition & 1 deletion backend/src/bin/discord-ws.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ async function spawnClient(

logger.info({ payload }, 'Processing Discord WS Message!')

tracer.startActiveSpan('ProcessDiscordWSMessage', async (span) => {
await tracer.startActiveSpan('ProcessDiscordWSMessage', async (span) => {
try {
const integration = (await IntegrationRepository.findByIdentifier(
guildId,
Expand Down
2 changes: 1 addition & 1 deletion backend/src/bin/job-generator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ for (const job of jobs) {
const cronJob = new CronJob(
job.cronTime,
async () => {
tracer.startActiveSpan(`ProcessingJob:${job.name}`, async (span) => {
await tracer.startActiveSpan(`ProcessingJob:${job.name}`, async (span) => {
log.info({ job: job.name }, 'Triggering job.')
try {
await job.onTrigger(log)
Expand Down
4 changes: 2 additions & 2 deletions backend/src/bin/nodejs-worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ async function handleDelayedMessages() {
const message = await receive(true)

if (message) {
tracer.startActiveSpan('ProcessDelayedMessage', async (span) => {
await tracer.startActiveSpan('ProcessDelayedMessage', async (span) => {
try {
const msg: NodeWorkerMessageBase = JSON.parse(message.Body)
const messageLogger = getChildLogger('messageHandler', serviceLogger, {
Expand Down Expand Up @@ -130,7 +130,7 @@ async function handleMessages() {
handlerLogger.info('Listening for messages!')

const processSingleMessage = async (message: Message): Promise<void> => {
tracer.startActiveSpan('ProcessMessage', async (span) => {
await tracer.startActiveSpan('ProcessMessage', async (span) => {
const msg: NodeWorkerMessageBase = JSON.parse(message.Body)

const messageLogger = getChildLogger('messageHandler', serviceLogger, {
Expand Down
74 changes: 51 additions & 23 deletions backend/src/database/repositories/organizationRepository.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import lodash, { chunk } from 'lodash'
import { get as getLevenshteinDistance } from 'fast-levenshtein'
import validator from 'validator'
import { FieldTranslatorFactory, OpensearchQueryParser } from '@crowd/opensearch'
import { PageData } from '@crowd/common'
Expand Down Expand Up @@ -27,6 +28,11 @@ import SegmentRepository from './segmentRepository'

const { Op } = Sequelize

interface IOrganizationIdentityOpensearch {
string_platform: string
string_name: string
}

interface IOrganizationPartialAggregatesOpensearch {
_source: {
uuid_organizationId: string
Expand All @@ -38,10 +44,12 @@ interface IOrganizationPartialAggregatesOpensearch {
}
}

interface IOrganizationIdOpensearch {
interface ISimilarOrganization {
_score: number
_source: {
uuid_organizationId: string
nested_identities: IOrganizationIdentityOpensearch[]
nested_weakIdentities: IOrganizationIdentityOpensearch[]
}
}

Expand All @@ -54,8 +62,6 @@ interface IOrganizationNoMerge {
noMergeId: string
}

type MinMaxScores = { maxScore: number; minScore: number }

class OrganizationRepository {
static async filterByPayingTenant(
tenantId: string,
Expand Down Expand Up @@ -1162,28 +1168,48 @@ class OrganizationRepository {
return 10
}

const normalizeScore = (max: number, min: number, score: number): number => {
if (score > 100) {
return 1
}
const calculateSimilarity = (
primaryOrganization: IOrganizationPartialAggregatesOpensearch,
similarOrganization: ISimilarOrganization,
): number => {
let smallestEditDistance: number = null

if (max === min) {
return (40 + Math.floor(Math.random() * 26) - 10) / 100
}

const normalizedScore = (score - min) / (max - min)
let similarPrimaryIdentity: IOrganizationIdentityOpensearch = null

// randomize the cases where score === max and score === min
if (normalizedScore === 1) {
return Math.floor(Math.random() * (76 - 50) + 50) / 100
// find the smallest edit distance between both identity arrays
for (const primaryIdentity of primaryOrganization._source.nested_identities) {
// similar organization has a weakIdentity as one of primary organization's strong identity, return score 95
if (
similarOrganization._source.nested_weakIdentities.length > 0 &&
similarOrganization._source.nested_weakIdentities.some(
(weakIdentity) =>
weakIdentity.string_name === primaryIdentity.string_name &&
weakIdentity.string_platform === primaryIdentity.string_platform,
)
) {
return 0.95
}
for (const secondaryIdentity of similarOrganization._source.nested_identities) {
const currentLevenstheinDistance = getLevenshteinDistance(
primaryIdentity.string_name,
secondaryIdentity.string_name,
)
if (smallestEditDistance === null || smallestEditDistance > currentLevenstheinDistance) {
smallestEditDistance = currentLevenstheinDistance
similarPrimaryIdentity = primaryIdentity
}
}
}

// normalization is resolved to 0, randomize it
if (normalizedScore === 0) {
return Math.floor(Math.random() * (41 - 20) + 20) / 100
// calculate similarity percentage
const identityLength = similarPrimaryIdentity.string_name.length

if (identityLength < smallestEditDistance) {
// if levensthein distance is bigger than the word itself, it might be a prefix match, return medium similarity
return (Math.floor(Math.random() * 21) + 20) / 100
}

return normalizedScore
return Math.floor(((identityLength - smallestEditDistance) / identityLength) * 100) / 100
}

const tenant = SequelizeRepository.getCurrentTenant(options)
Expand Down Expand Up @@ -1433,17 +1459,18 @@ class OrganizationRepository {
collapse: {
field: 'uuid_organizationId',
},
_source: ['uuid_organizationId'],
_source: ['uuid_organizationId', 'nested_identities', 'nested_weakIdentities'],
}

const organizationsToMerge: IOrganizationIdOpensearch[] =
const organizationsToMerge: ISimilarOrganization[] =
(
await options.opensearch.search({
index: OpenSearchIndex.ORGANIZATIONS,
body: sameOrganizationsQueryBody,
})
).body?.hits?.hits || []

/*
const { maxScore, minScore } = organizationsToMerge.reduce<MinMaxScores>(
(acc, organizationToMerge) => {
if (!acc.minScore || organizationToMerge._score < acc.minScore) {
Expand All @@ -1458,10 +1485,11 @@ class OrganizationRepository {
},
{ maxScore: null, minScore: null },
)
*/

for (const organizationToMerge of organizationsToMerge) {
yieldChunk.push({
similarity: normalizeScore(maxScore, minScore, organizationToMerge._score),
similarity: calculateSimilarity(organization, organizationToMerge),
organizations: [
organization._source.uuid_organizationId,
organizationToMerge._source.uuid_organizationId,
Expand Down Expand Up @@ -1541,7 +1569,7 @@ class OrganizationRepository {
organizations: [i, organizationToMergeResults[idx]],
similarity: orgs[idx].similarity,
}))
return { rows: result, count: orgs[0].total_count / 2, limit, offset }
return { rows: result, count: orgs[0].total_count, limit, offset }
}

return { rows: [{ organizations: [], similarity: 0 }], count: 0, limit, offset }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ const rules = {
const $externalResults = ref({});
const $v = useVuelidate(rules, form, { $externalResults });
const $v = useVuelidate(rules, form, { $externalResults, $stopPropagation: true });
watch(
() => props.integration,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ function findPlatform(platform) {
function onInputChange(newValue, key, value, index) {
model.value.identities[index] = {
...props.modelValue.identities[index],
name: newValue,
url: newValue.length ? `https://${value.urlPrefix}${newValue}` : null,
};
}
Expand Down
Loading

0 comments on commit 5f120e8

Please sign in to comment.