Monday, 25 November 2024

Combine ResNet-50 embeddings with metadata for improved accuracy in your image search system

Step 1: Install Required Libraries

Ensure you have the necessary libraries:

  • Install Milvus and its Python SDK pymilvus.
  • Install libraries for handling images and metadata (e.g., TensorFlow, Scikit-learn).
pip install pymilvus tensorflow scikit-learn

Step 2: Set Up Milvus

  1. Start Milvus:

    • Install and run Milvus locally or use a hosted service like Zilliz Cloud.
    docker-compose up -d
  2. Connect to Milvus:

    • Use the Python SDK to connect to the Milvus server.
    from pymilvus import connections # Connect to Milvus connections.connect("default", host="127.0.0.1", port="19530")
  3. Create a Collection:

    • Design a schema to store image embeddings and metadata.
    from pymilvus import CollectionSchema, FieldSchema, DataType, Collection # Define schema fields = [ FieldSchema(name="image_embedding", dtype=DataType.FLOAT_VECTOR, dim=2048), # ResNet-50 embeddings FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50), # Categorical metadata FieldSchema(name="brand", dtype=DataType.VARCHAR, max_length=50), # Categorical metadata FieldSchema(name="price", dtype=DataType.FLOAT), # Numerical metadata ] schema = CollectionSchema(fields, description="Image search collection") # Create collection collection = Collection("ecommerce_image_search", schema)

Step 3: Index Data

  1. Extract and Encode Features:

    • Use ResNet-50 to extract embeddings and encode metadata (as explained in earlier steps).
  2. Insert Data into Milvus:

    • Combine embeddings with metadata and add them to Milvus.
    # Example data image_embedding = [0.1, 0.2, ..., 0.9] # Example 2048-d embedding category = "shoes" brand = "Nike" price = 99.99 # Insert data into Milvus data = [[image_embedding], [category], [brand], [price]] collection.insert(data) print("Data inserted successfully")
  3. Create Index for Faster Search:

    • Create a vector index for the image_embedding field to optimize similarity search.
    index_params = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 128}} collection.create_index(field_name="image_embedding", index_params=index_params) print("Index created successfully")

Step 4: User Input (Query)

  1. Image Upload:

    • Extract ResNet-50 embedding from the uploaded image.
    query_image_embedding = extract_features("uploaded_image.jpg")
  2. Metadata Filters:

    • Get metadata selections (e.g., category: "shoes", brand: "Nike").
    • Convert filters to SQL-like queries for Milvus.

Step 5: Perform Hybrid Search

  1. Combine Image and Metadata Search:

    • Milvus supports hybrid searches using metadata filters.
    # Define a query search_params = {"metric_type": "L2", "params": {"nprobe": 10}} filters = "category == 'shoes' && brand == 'Nike'" # User-selected metadata filters # Perform search results = collection.search( data=[query_image_embedding], # Input vector anns_field="image_embedding", # Vector field name param=search_params, limit=10, # Number of results expr=filters # Metadata filter ) # Display results for hit in results[0]: print(f"ID: {hit.id}, Score: {hit.score}, Metadata: {hit.entity}")

Step 6: Return Results

  1. Retrieve Matching Products:

    • Use the IDs of the search results to fetch additional product details (e.g., names, images) from your database.
  2. Display Results:

    • Show visually similar products filtered by the selected metadata on the front end.

Step 7: Refine and Optimize

  1. Weighting Embeddings and Metadata:

    • If image features are more critical, assign higher weight to embeddings.
    combined_embedding = 0.8 * image_embedding + 0.2 * metadata_vector
  2. Tune Milvus Parameters:

    • Experiment with nlist and nprobe in the index parameters for better speed and accuracy.
  3. Monitor Performance:

    • Regularly update indexes and handle metadata updates efficiently.

Advantages of Using Milvus

  • Efficient handling of large-scale image and metadata data.
  • Native support for hybrid searches (combining vectors and metadata).
  • Scalable and integrates well with machine learning workflows.

Would you like more details on any specific step or a complete code example?

Thank you

Sunday, 17 November 2024

Change config php 8.1

1. Update php.ini

upload_max_filesize = 128M post_max_size = 128M max_execution_time = 300

2. Restart php 8.1

sudo systemctl restart php8.1-fpm

3. View Config

<?php phpinfo(); ?>

Thank you

Tuesday, 5 November 2024

New features in MySQL 8

MySQL 8 introduced several powerful new features and improvements that make it a significant update over previous versions. Here are some of the key new features in MySQL 8:

1. Window Functions

  • MySQL 8 added support for window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG(). These functions allow for more advanced analytical queries, letting users calculate moving averages, ranks, cumulative sums, and more.

* Use RANK

 SELECT  s.student_id, s.student_name, SUM(p.score) AS total_score,
 RANK() OVER (ORDER BY SUM(p.score) DESC) AS rank_score
 FROM demo.student s
 JOIN demo.point p ON s.student_id = p.student_id
 GROUP BY s.student_id, s.student_name
 ORDER BY rank_score DESC

* Error

 SELECT  s.student_id, s.student_name, SUM(p.score) AS total_score,
 RANK() OVER (ORDER BY SUM(p.score) DESC) AS rank_score
 FROM demo.student s
 JOIN demo.point p ON s.student_id = p.student_id
 WHERE rank_score > 3
 GROUP BY s.student_id, s.student_name
 ORDER BY rank_score DESC

=> The issue here is that window functions like RANK() cannot be used directly in the WHERE clause because they are applied after the GROUP BY and SELECT operations. You need to compute the ranking in a subquery or CTE and then filter on it in the outer query.

* Fix use subquery

SELECT student_id, student_name, total_score, rank_score
FROM (
    SELECT s.student_id, s.student_name,
           SUM(p.score) AS total_score,
           RANK() OVER (ORDER BY SUM(p.score) DESC) AS rank_score
    FROM demo.student s
    JOIN demo.point p ON s.student_id = p.student_id
    GROUP BY s.student_id, s.student_name
) AS ranked_students
WHERE rank_score > 3
ORDER BY rank_score DESC;

* Fix use CTE

WITH RankedStudents AS (
    SELECT s.student_id, s.student_name,
           SUM(p.score) AS total_score,
           RANK() OVER (ORDER BY SUM(p.score) DESC) AS rank_score
    FROM demo.student s
    JOIN demo.point p ON s.student_id = p.student_id
    GROUP BY s.student_id, s.student_name
)
SELECT student_id, student_name, total_score, rank_score
FROM RankedStudents
WHERE rank_score > 3
ORDER BY rank_score DESC;

Key Differences:

AspectSubqueryCTE
StructureDefined inline in the FROM clause.Defined before the SELECT statement.
ReadabilityCan be harder to read for complex queries.Easier to read and maintain, especially for complex queries.
ReuseCannot be reused within the same query.Can be reused multiple times in the query (in case of multiple references to the same CTE).
ScopeAvailable only within the query block in which it is defined.Available throughout the query until the end.
PerformanceOften optimized by the database engine but may involve recalculating the subquery each time it is referenced.Sometimes optimized by the database, and CTEs are often materialized (especially in PostgreSQL, but not always in MySQL).
MaterializationSubquery results might be materialized only once in certain scenarios, but not guaranteed.In MySQL, CTEs are generally not materialized, so they are evaluated each time they are referenced.

Conclusion:

  • Use Subqueries when:
    • You have a simpler query.
    • You don’t need to reference the same intermediate result multiple times.
    • Performance is slightly more important than readability.
  • Use CTEs when:
    • You have a complex query that would benefit from being split into logical steps.
    • You need to reference the same intermediate result multiple times in the query.
    • You prioritize readability and maintainability over minor performance differences.

In Your Case:

Since both the subquery and the CTE versions are likely to perform similarly in MySQL 8.0, the decision comes down to readability. If you are working with a simple query, a subquery might suffice. If you expect the query to grow in complexity or if you want a cleaner structure, a CTE would be more maintainable.

2. Common Table Expressions (CTEs)

  • With CTEs, users can define temporary result sets to be referenced within a SELECT, INSERT, UPDATE, or DELETE query. MySQL 8 supports recursive CTEs, making it easier to work with hierarchical data, such as organizational charts or file systems.
WITH RECURSIVE cte AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM cte WHERE n < 5 ) SELECT * FROM cte;

3. JSON Enhancements

  • MySQL 8 includes additional functions for handling JSON data, including JSON_TABLE() (which turns JSON data into a relational format) and JSON_OVERLAPS() (to check if two JSON arrays have common elements).

4. Improved Indexing with Invisible Indexes

  • MySQL 8 introduced invisible indexes, which let you mark an index as invisible to the query optimizer. This is useful for testing how queries perform without a specific index, without fully deleting it.
ALTER TABLE table_name ALTER INDEX index_name INVISIBLE;

5. Roles for Better Access Control

  • MySQL 8 added support for roles, allowing for easier management of user permissions. Roles can be created with specific privileges and then assigned to users, simplifying complex privilege assignments.
CREATE ROLE read_only; GRANT SELECT ON database.* TO read_only; GRANT read_only TO user_name;

6. Generated Columns

  • MySQL 8 introduced generated (computed) columns, which can store data based on expressions or other column values. Generated columns can be virtual (calculated on-the-fly) or stored (physically stored in the table).
ALTER TABLE employees ADD COLUMN full_name VARCHAR(255) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) VIRTUAL;

7. Better Support for Geographic Data with Spatial Indexing

  • MySQL 8 improved spatial data support by adding spatial indexes for InnoDB tables, making it more efficient for geographic or location-based queries.

8. Atomic DDL Statements

  • In MySQL 8, DDL (Data Definition Language) operations are atomic and crash-safe, meaning that a failure during a CREATE, ALTER, DROP, or RENAME operation can be rolled back automatically.

9. Improved UTF-8 Support with utf8mb4

  • MySQL 8 made utf8mb4 the default character set, supporting a wider range of Unicode characters, including emojis. The utf8mb4 encoding is now recommended for improved compatibility with modern web applications.

10. Descending Indexes

  • MySQL 8 allows users to create descending indexes, optimizing queries that need sorted data in descending order without needing additional sorting at query time.
CREATE INDEX idx_desc ON table_name(column_name DESC);

11. Histograms for Better Query Optimization

  • MySQL 8 supports histograms, which provide more accurate statistics for the query optimizer, helping it choose the best execution plans for complex queries. Histograms work especially well for non-indexed columns.
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name;

12. Improved Performance Schema and Sys Schema

  • MySQL 8 expanded the Performance Schema and introduced the Sys Schema as a helper schema, making it easier to monitor and optimize database performance. The Sys Schema provides user-friendly views for checking performance metrics.

13. Resource Groups

  • Resource Groups in MySQL 8 allow users to allocate specific CPU resources to certain queries or workloads, improving control over query performance for different tasks.
CREATE RESOURCE GROUP rg1 TYPE = USER VCPU = 1,2;

14. Optimizer Hints with SET_VAR

  • MySQL 8 introduced the SET_VAR hint, allowing users to override session-level system variables for specific queries, such as changing the max_execution_time for a single query.
SELECT /*+ SET_VAR(max_execution_time=5000) */ * FROM large_table;

15. Improved InnoDB and Backup Features

  • Enhancements to InnoDB include improved support for bulk loading, reduced redo log size, and better crash recovery. Backup performance and reliability are also improved with these changes.

These features make MySQL 8 far more powerful, flexible, and performant for complex, high-demand applications.

==============

Example database

CREATE TABLE IF NOT EXISTS `point` (
  `point_id` int NOT NULL,
  `student_id` int DEFAULT NULL,
  `subject` varchar(50) DEFAULT NULL,
  `score` decimal(5,2) DEFAULT NULL,
  PRIMARY KEY (`point_id`),
  KEY `student_id` (`student_id`),
  CONSTRAINT `point_ibfk_1` FOREIGN KEY (`student_id`) REFERENCES `student` (`student_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- Dumping data for table demo.point: ~10 rows (approximately)
INSERT INTO `point` (`point_id`, `student_id`, `subject`, `score`) VALUES
    (1, 1, 'Math', 9.00),
    (2, 1, 'English', 9.00),
    (3, 2, 'Math', 8.00),
    (4, 2, 'English', 8.00),
    (5, 3, 'Math', 8.00),
    (6, 3, 'English', 8.00),
    (7, 4, 'Math', 6.00),
    (8, 4, 'English', 6.00),
    (9, 5, 'Math', 5.00),
    (10, 5, 'English', 5.00);

-- Dumping structure for table demo.student
CREATE TABLE IF NOT EXISTS `student` (
  `student_id` int NOT NULL,
  `student_name` varchar(50) DEFAULT NULL,
  `age` int DEFAULT NULL,
  PRIMARY KEY (`student_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- Dumping data for table demo.student: ~5 rows (approximately)
INSERT INTO `student` (`student_id`, `student_name`, `age`) VALUES
    (1, 'Alice', 20),
    (2, 'Bob', 22),
    (3, 'Charlie', 21),
    (4, 'David', 23),
    (5, 'Eva', 20);

 Thank you

SQL clauses and the order in which they are executed

SQL Execution Order (Logical Order):

  1. FROM:
    • This clause specifies the tables (or other data sources) from which to retrieve data. It happens first because it determines which rows are available to the rest of the query.
  2. JOIN:
    • If you're using JOIN, it’s performed immediately after FROM. The join operation combines data from multiple tables based on a condition.
  3. WHERE:
    • Filters the rows returned by the FROM and JOIN clauses based on a condition. The rows are selected after the join operations.
  4. GROUP BY:
    • Groups the rows based on the columns you specify. After filtering (WHERE), SQL groups the data based on specified columns.
  5. HAVING:
    • Similar to WHERE, but it is used to filter groups created by GROUP BY. You would use HAVING to filter out groups based on aggregated results, such as SUM, COUNT, etc.
  6. SELECT:
    • Selects the columns to be displayed in the final output. The SELECT clause is executed after FROM, JOIN, WHERE, and GROUP BY (and HAVING if used), because it operates on the filtered and grouped data.
  7. DISTINCT:
    • If you’re using DISTINCT, it happens immediately after SELECT to remove duplicate rows from the result set.
  8. ORDER BY:
    • The data is sorted based on the columns specified in the ORDER BY clause. Sorting occurs after SELECT and DISTINCT (if used), and it organizes the rows in ascending or descending order.
  9. LIMIT / OFFSET:
    • These clauses are used to restrict the number of rows returned. They occur last, after all sorting and selection of rows are done.

Example:

SELECT DISTINCT s.student_id, s.student_name, SUM(p.score) AS total_score FROM student s JOIN point p ON s.student_id = p.student_id WHERE s.age > 18 GROUP BY s.student_id, s.student_name HAVING SUM(p.score) > 50 ORDER BY total_score DESC LIMIT 1 OFFSET 2;

Execution Order for the Above Query:

  1. FROM: Start with the student and point tables.
  2. JOIN: Combine student and point based on the student_id column.
  3. WHERE: Filter the students who are older than 18 years.
  4. GROUP BY: Group the remaining rows by student_id and student_name.
  5. HAVING: Filter the groups where the total score is greater than 50.
  6. SELECT: Select the columns student_id, student_name, and the aggregated SUM(p.score) as total_score.
  7. DISTINCT: Remove any duplicate rows if necessary.
  8. ORDER BY: Sort the results by total_score in descending order.
  9. LIMIT/OFFSET: Select the 3rd row in the sorted result (because of LIMIT 1 OFFSET 2).

Thank you

Big data import process

Big data import process:

  1. Stream Data Line-by-Line: Read the file line-by-line to keep memory usage low.

  2. Batch into Groups of 1,000: Accumulate rows into batches of 1,000 for efficient bulk inserts/updates.

  3. Queue Each Batch for Processing: Dispatch each batch to a job queue, enabling asynchronous processing and distributing the load.

  4. Use Transactions for Data Consistency: Wrap database operations in a transaction within each job to ensure that either all rows are processed or none, preserving data integrity.

  5. Execute Multiple Jobs Concurrently: Configure Laravel’s queue workers to process multiple jobs in parallel, speeding up the import process.

Thank you

Golang Advanced Interview Q&A