LIMIT problems: linear

Showing posts with label linear. Show all posts

Wednesday, March 21, 2012

linear regression with nested explanation variable

We are trying to create a model of linear regression with nested table. We used the create mining model sintax as follow :

create mining model rate_plan3002_nested2

( CUST_cycle LONG KEY,

VOICE_CHARGES double CONTINUOUS predict,

DUR_PARTNER_GRP_1 double regressor CONTINUOUS ,

nested_taarif_time_3002 table

( CUST_cycle long CONTINUOUS,

TARIFF_TIME text key,

TARIFF_VOICE_DUR_ALL double regressor CONTINUOUS

)

) using microsoft_linear_regression

INSERT INTO MINING STRUCTURE [rate_plan3002_nested2_Structure]

(CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1 ,

[nested_taarif_time_3002](SKIP,TARIFF_TIME ,TARIFF_VOICE_DUR_ALL)

)

SHAPE {

OPENQUERY([Cell],

'SELECT CUST_cycle ,

VOICE_CHARGES ,

DUR_PARTNER_GRP_1

FROM dbo.panel_anality_3002

order by CUST_cycle ')}

APPEND

({OPENQUERY([Cell],

'select CUST_cycle,

TARIFF_TIME,

CYCLE_DATE

from dbo.nested_taarif_time_3002

order by CUST_cycle,TARIFF_TIME')

}

relate CUST_cycle to CUST_cycle

) as nested_taarif_time_3002

The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)

Is there a way to do this regression correctly?

Thanks,

Dror

Hi Dror,

You could remove the "regressor" flag from the nested table column (in the create mining model statement) if this column is not indented to be part of the regression equation.

Thanks,

Dana Cristofor

|||

Thanks Dana,

the problem is that ido want it to be part of the regression

otherwise i don't need the nested table

Thanks

|||

I'm investigating an issue - let me work on this and get back to you.

Thanks

|||

I think I may have found the problem. I created a simple model with processed and had expected regressions. I then created the same model using a nested table and processed and got a constant result back - very confusing. I then tried creating an additional model in the nested structure that would be identical to the first non-nested model I created, and again got no regressions - startling!

What I found out was that for some reason (possibly a bug) when I added a nested table, the wizard did not add the "regressor" flag to any of my continuous inputs. Once I manually added the regressor flag and reprocessed, I got the expected regressions in my output.

Please check the regressor flag on the model columns and let me know if this helps for you. To set the regressor flag, go to the Mining Models tab of the Data Mining designer, click the cell representing the input column under the mining model (not the mining structure) and view the properties. The regressor flag is a possible option for the mining model column.

Thanks

-Jamie

|||

Thanks Jamie,

we added the regressor flag in the minig models tab, and the we got the same results:

when we add the regressor flag to the column in the nested table, we get an intercept only model.

if we put the regressor flag on the nested table only, and not on the column of the table, we get the same regression as if we didn't use the nested table.

is there a way to solve it?

we would also like to run the model from the DMXquery of the management studio, is there a way to get the script of the model from visual studio?

Thanks,

Dror

|||

I wonder if it's possible that there's too much "noise" with the nested table? I've tried with a degenerate case to prove that there's no obvious bug in the software preventing nested regressors from working. Can you also try the same model with the decision tree algorithm and see what happens?

You can get the DMX form of the model by following the instructions at the tip and trick here: http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/3652.aspx

Linear Regression for column values

This is a real challenge. I hope someone is smart enough to know how
to do this.

I have a table

TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]

[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]

Slope is defined as "M" in the equation y=mx+b

I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.

Are there any smart people around that would know how to do this?

thankssql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

This is a real challenge. I hope someone is smart enough to know how
to do this.
>
I have a table
>
TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]
>
>
[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]
>
>
Slope is defined as "M" in the equation y=mx+b
>
I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.
>
Are there any smart people around that would know how to do this?

Smart? Either you did learn linear regression in school or you didn't.

I will have to admit that although I taught mathematical statistics as a
student, that I've forgotten the forumulas. But I looked them up in my
statisticcs book:

m = SUM((Xi - AVG(X))*(Yi - AVG(Y)) / SUM(SQR(Xi - AVG(X))

where Xi and Yi are the individual values for the X and Y variables.

For the computation in SQL you can of course not use SUM and AVG since
your table is turned the wrong way. The years should have been columns
instead.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thank you for your information. I never learned the actual formula,
only how to click the linear regression button in excel.

Is there a way to transform the table into the correct format? Maybe a
tranverse function in SQL?

thanks

Erland Sommarskog wrote:

Quote:

Originally Posted by

sql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

This is a real challenge. I hope someone is smart enough to know how
to do this.

I have a table

TABLE1
[Column 1- 2001]
[Column 2- 2002]
[Column 3- 2003]
[Column 4 - 2004]
[Column 5 - 2005]
[Column 6 - 2006]
[Column 7 - Slope]

[2001][2002][2003][2004][2005][2006] [Slope]
[1] [2] [3] [4] [5] [6] [1]
[1.2] [.9] [4] [5] [5.4] [6.2] [?]

Slope is defined as "M" in the equation y=mx+b

I need a way a finding the linear equation that best fits the points so
I can have SQL calculate the slope.

Are there any smart people around that would know how to do this?

>
Smart? Either you did learn linear regression in school or you didn't.
>
I will have to admit that although I taught mathematical statistics as a
student, that I've forgotten the forumulas. But I looked them up in my
statisticcs book:
>
m = SUM((Xi - AVG(X))*(Yi - AVG(Y)) / SUM(SQR(Xi - AVG(X))
>
where Xi and Yi are the individual values for the X and Y variables.
>
For the computation in SQL you can of course not use SUM and AVG since
your table is turned the wrong way. The years should have been columns
instead.
>
>
>
--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
>
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

|||sql guy123 (stoppal@.hotmail.com) writes:

Quote:

Originally Posted by

Thank you for your information. I never learned the actual formula,
only how to click the linear regression button in excel.
>
Is there a way to transform the table into the correct format? Maybe a
tranverse function in SQL?

There is an UNPIVOT operator in SQL 2005. But it's mainly syntactic
sugar, and you can easily unpivot without it:

SELECT year, value = CASE year WHEN 2001 THEN [2001]
WHEN 2002 THEN [2002]
...
END
FROM tbl
CROSS JOIN (SELECT 2001 UNION ALL
SELECT 2002 UNION ALL
...) AS year

My thought, though, was that you would redesign the table on a
permanent basis.

Linear Regression and tolking the coefficient for each variabel?

When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?

You can get the variance of the coefficients from node_distribution of linear regression root node. The following DMX query will return the information you need:

select flattened Node_Distribution from [Model Name].Content where Node_type = 25

The query result is a DataTable with the following column names:

Node_Distribution.ATTRIBUTE_NAMENode_Distribution.ATTRIBUTE_VALUE

Node_Distribution.SUPPORTNode_Distribution.PROBABILITYNode_Distribution.VARIANCE

Node_Distribution.VALUETYPE

And the data rows will look like this:

C5Missing00.007874015748031501

C5228.58812310000.99212598425196997913.5033

C431.9528749318584006.737398583630157

C432.1589801071577000 8

C45.29710479474068006.737398583630159

59.33039676663630090944.677980779511

The rows with a value type 7 (emphasized in above table) contain the information you need. You might need to write a small C# program to iterate through the resulting datable to get the information you want.

Thanks,

Linear Regression and tolking the coefficient for each variabel?

You can get the variance of the coefficients from node_distribution of linear regression root node. The following DMX query will return the information you need:

select flattened Node_Distribution from [Model Name].Content where Node_type = 25

The query result is a DataTable with the following column names:

Node_Distribution.ATTRIBUTE_NAMENode_Distribution.ATTRIBUTE_VALUE

Node_Distribution.SUPPORTNode_Distribution.PROBABILITYNode_Distribution.VARIANCE

Node_Distribution.VALUETYPE

And the data rows will look like this:

C5Missing00.007874015748031501

C5228.58812310000.99212598425196997913.5033

C431.9528749318584006.737398583630157

C432.1589801071577000 8

C45.29710479474068006.737398583630159

59.33039676663630090944.677980779511

Thanks,

sql

linear interpolation

my data is like this:

header | data | key
------
500 | 3.2 | 10
500 | 3.4 | 20
500 | 3.6 | 25
500 | 3.7 | 40
501 | 4.1 | 10
501 | 4.2 | 15
501 | 4.4 | 30
501 | 4.6 | 35

and what I want to do is find the median of "data", but keyed off of "key", so if my desired median is 30, I want to take the two records (data, key) nearest to key = 30, and get the average of "data".
...and do this within each "header" value.

actually, to be precise, I want the linear interpolation, so for header = 500, I want to get the (data, key) pairs of (3.6, 25) and (3.7, 40) and return the interpolated "data" value of 3.6333 (as done here (http://en.wikipedia.org/wiki/Linear_interpolation))

so for the above example the query would produce:

header | interp
------
500 | 3.633
501 | 4.4

possible, or am I crazy?looks like I may have figured it out :)

I created a function to do the interpolation, turned out to be rather easy.|||Got here late. Sorry no one could help you before you sussed it.

Dude - post your solution so others can gain too.|||thank you:beer:|||Got here late. Sorry no one could help you before you sussed it.

Dude - post your solution so others can gain too.

whoops, was going to but totally forgot.

first created an interpolation function:

create function fnInterpolate(@.header int, @.interp float)
returns float
as
begin
declare @.x0 float, @.x1 float, @.y0 float, @.y1 float, @.slope float, @.b float;

select top 1 @.x0 = key, @.y0 = data from the_table where header=@.header and key < @.interp order by key desc;

select top 1 @.x1 = key, @.y1 = data from the_table where header=@.header and key >= @.interp order by key asc;

set @.slope = (@.y1 - @.y0) / (@.x1 - @.x0);
set @.b = @.y1 - (@.slope * @.x1);
return (@.slope * @.interp) + @.b;
end

then did:

select header, fnInterpolate(header) from the_table

sorry it's a bit messy, but the actual usage is a bit more complex so I wanted to simplify. basically I select "header" from the table, and call fnInterpolate on that header with a set of conditions I specify. then fnInterpolate uses those conditions to get the two points are around the interpolation point, computes the interpolation, and returns it.

very well could be a more elegant way to do this though.

Linear equations?

Is there a way to solve a system of linear equations in Analysis Services?
Let's say I have a table with two columns for all the coefficients:
C12 - 0.30
C13 - 0.70
C21 - 0.15
C23 - 0.10
....
Is there something in Analysis Services that can do this?Can you give a bit more detail on what you are trying to achieve?|||I am trying to calculate cost allocations for budgeting, based on internal costs and a grid of distribution percentages. It comes down to solving a system of linear equations. It would have been great if I can create a report where this system is solved internally everytime the report is run.|||

There are a set of MDX functions for handling linear regression problems. It should be possible for you to use them to define one or more calculated members that take the output of one or more of the functions and calculate the cost allocations. If you look in Books Online at the MDX functions list, you'll see LinRegIntercept, LinRegPoint, LinRegSlope, LinRegVariance, and LinRegR2...

Dave Fackler

|||

And if you want to see an example of how these LinReg functions can be applied - you can check the following article: http://www.mosha.com/msolap/articles/mdxlinreg.htm

Wednesday, March 21, 2012

linear regression with nested explanation variable

Linear Regression for column values

Linear Regression and tolking the coefficient for each variabel?

Linear Regression and tolking the coefficient for each variabel?

linear interpolation

Linear equations?

LIMIT problems

Blog Archive

About Me